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Some Effects of Prolonged Experience in Communication 
Nets’ 


Marvin E. Shaw and Gerard H. Rothschild 
The Johns Hopkins University 


Numerous experimental studies have dem- 
onstrated that the arrangement of communi- 
cation channels among the members of a 
group has a significant effect upon group per- 
formance and satisfaction (1, 2, 3, 5, 6, 7, 9, 
i1, 12, 13). Generally speaking, the com- 
munication net which permits more nearly 
equal participation by the group members re- 
sults in higher member satisfaction and, when 
the task is to solve a relatively complex prob- 
lem, smaller time-and-error scores than does 
a communication net which restricts the par- 
ticipation of some group members more than 
others. 

These experiments have all used relatively 
short experimental periods, usually one ses- 
sion of about 50 minutes, although some have 
required one session of 2 to 24} hours (7, 13). 
It is possible that the observed effects of the 
communication net are temporary in nature, 
and that such differences would disappear 
(or perhaps reverse in direction) if groups 
were required to function on a day-to-day 
basis. The present experiment was designed 
to check on this possibility. 


Method 
Apparatus 


The apparatus used in this experiment was the 
same as that described in earlier reports (9, 11). It 
consists of four cubicles which are connected with 
each other by slots through the walls separating 
them. The Ss communicate by writing messages on 
3 < 5 cards and passing them through slots. Various 


1 This experiment was done under Contract N5- 
ori-166, Task Order 1, between the Office of Naval 
Research and The Johns Hopkins University. This 
is Report No. 166-I-202, Project Designation No. 
NR 145-089, under that Contract. 
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communication nets can be imposed by closing the 
appropriate The three communication nets 
used in this experiment are shown in Fig. 1 


slots 


Procedure 


There were 20 problems requiring simple arith- 
metical computations similar to those described in 
earlier reports (9, 12). Eight items of information 
were needed to solve each problem. At the start of 
any test, each S was given two of these items. The 
order of presentation of problems to a given group 
was random, except that each order of presentation 
in one net was replicated in each of the other two 
nets. 

All Ss were male undergraduates at The Johns 
Hopkins University. They were paid for their serv- 
ices. Eight groups of four Ss each were randomly 
assigned to each of the three nets. The experimental 
design required that each group meet each day (ex- 
cluding Saturdays and Sundays) at approximately 
the same time for a total of ten days. Each group 
solved two problems each day. At the beginning of 
the experiment, Ss were told the general nature of 
the task, the method of communication, and who 
could communicate with whom. 

At the end of the last session, Ss were required to 
complete a questionnaire which asked for: (a) rat- 
ings, on an 11-point scale, of over-all satisfaction 
with their job in the group, (b) whether the group 
had a leader, and if so who occupied this position, 
and (c) whether the group developed a system, and 
if so what was the nature of the system. After this 


AGo) (15)8 WY IX ‘OO 
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Fic. 1. The experimental nets. The numbers 

within circles are the Independence scores for each 


position, computed according to a formula given in 
a previous report (10). 
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;. 2. Mean time per problem as a function of 
practice in three communication nets. 


questionnaire had been collected by E, Ss were asked 
to indicate their satisfaction with the job on a day- 
to-day basis. The device for obtaining this infor- 
mation consisted of a grid with the 11-point rating 
scale along the ordinate and days along the abscissa. 
S rated his satisfaction with the group situation for 
each day (as he remembered it) by simply checking 
the appropriate space. This gave a graphic picture 
of the course of satisfaction over time in the vari- 
ous nets. 


Results 


The results of this experiment will be 
discussed under the following headings: (a) 
time, (0) message units, (c) errors, (d) rat- 


ings of satisfaction, (e) emergence of leader- 


ship, and (f) organization. In all of the 
analyses reported, there were large differ- 
ences among groups and individuals treated 
alike, and most of these differences were sta- 
tistically reliable. This finding agrees with 
the findings of previous investigations and 
appears to be of no great significance for the 
present study. 

Time. Time was measured from the “go”’ 
signal to the time the last person in the group 
had thrown his switch indicating that he knew 
the answer. The means of these scores are 
shown in Fig. 2. Analysis of variance” 
yielded significant Fs for nets (p < .05) and 
days (p< .001). Tukey’s (14) gap test ® 
indicated that the comcon differed signifi- 
cantly from the star and the slash, but that 
these latter two did not differ significantly. 


2 This analysis was performed upon scores which 
had been transformed by the square-root transfor- 
mation to achieve homogeneity of variance. 

3In all applications of the gap test the .05 level 
of confidence was accepted. 


This finding is at variance with a previous in- 
vestigation which found the slash faster than 
the star (9). The earlier study, however, re- 
quired Ss to solve only three problems during 
a single session; Fig. 2 shows that the slash 
was faster than the star during the first three 
days (six problems). Thus, the previous ex- 
perimental results are in agreement with these 
if we consider only the first few problems. 

A significant amount of learning occurred 
in all nets, in agreement with previous investi- 
gations. The gap test revealed that time 
scores decreased significantly from day to day 
during the first five days, but that decreases 
thereafter were not significant. 

The average time required by the Ss in the 
various positions within nets did not differ 
significantly, although differences were in the 
expected direction; i.e., Ss in positions hav- 
ing the higher Independence scores * required 
less time to reach a solution than did Ss in 
positions having lower Independence scores. 
Mean times and Independence scores corre- 
lated — .313, which, however, was not sta- 
tistically reliable. This finding was not un- 
expected since Independence scores had not 
been shown to be related to time scores in 
previous experiments. 

Message units. Contents of the messages 
transmitted by Ss in each group were ana- 
lyzed into units by defining a message unit 
as a simple sentence or any meaningful part 
of a complex or compound sentence. The re- 
sults of this analysis are shown in Fig. 3. 
As in the case of the time scores, the nets and 
days terms were significant (p < .001 in each 
case). These results are not in complete 
agreement with the results of short-term ex- 
periments. The star required fewer mes- 
sages than either the slash or the comcon as 
had been expected, but contrary to expecta- 
tions, the comcon required fewer messages 
than did the slash. This discrepancy fits in 
with that found in connection with the time 
scores; a possible reason for these findings is 
discussed later. 

Number of messages decreased with time in 

4 The Independence score is a measure of the de- 
gree of freedom of action permitted the individual 
as a result of his position in the communication net. 


A formula for computing this score has been given 
in a previous report (10). 
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each of the three nets, presumably because 
Ss learned to communicate more efficiently. 
The gap test showed that message units de- 
creased significantly only for the first four 
days. 

Differences between message units trans- 
mitted by Ss in various positions within nets 
agreed with expectations based upon previous 
findings. In the star, Position A (see Fig. 1) 
transmitted significantly (p < .01) more mes- 
sage units per problem (mean = 12.1) than 
did Positions B, C, and D (mean = 3.8). 
In the slash, Positions A and C transmitted 
significantly (p 1) more message units 
per problem (mean = 14.8) than did Posi- 
tions B and D (ii ).7). In the comcon, 
no significant difierences were found. Mean 
message units per problem and Independence 
scores correlated .251. Again, this was in the 
expected direction but was not significant. 

Errors. There were so few errors com- 
mitted by any of the groups that it was not 
possible to evaluate them statistically. The 
observed differences were consistent with 
previous findings in that the star produced 
more errors (mean per group = 2.0) than 
did either the slash (mean per group = 1.5) 


or the comcon (mean per group = 1.6). 


Ratings of satisfaction. The mean ratings 
of over-all satisfaction were not significantly 
different for the three nets, although differ- 
ences were in the expected direction (means 
were 7.56, 7.94, and 8.69 for the star, the 
slash, and the comcon, respectively). Dif- 
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Fic. 3. Mean number of message units transmitted 
per problem as a function of practice in the com- 
munication nets. 
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Fic. 4. Mean ratings of satisfaction by Ss within 
nets as a function of experience in the three nets. 





ferences between positions within nets were 
significant only in the case of the star; 
Position A rated over-all satisfaction signifi- 
cantly (p < .05) higher (mean = 9.63) than 
did Positions B, C, and D (mean = 6.96). 
Mean ratings of over-all satisfaction and In- 
dependence scores correlated .953; this was 
as expected and is statistically significant 
(p < 01). 

The results of the day-to-day ratings of 
satisfaction are shown in Fig. 4. Analysis of 
variance yielded significant Fs for nets (p 
< .05) and for days (p < .001). In agree- 
ment with expectations from previous experi- 
ments, the Ss in the comcon rated satisfac- 
tion higher than did Ss in the slash who in 
turn rated satisfaction higher than did Ss in 
the star. 

Differences in the course of satisfaction 
over time are especially interesting. The gap 
test revealed that the ratings increased sig- 
nificantly up to the third day and thereafter 
showed no significant differences between 
days. This certainly would not have been 
predicted from experimental evidence avail- 
able heretofore. Leavitt appears to be the 
only previous investigator who attempted to 
measure satisfaction as a function of experi- 
ence. He reported “trends of increasing 
satisfaction in the circle and decreasing satis- 
faction in the wheel” (6, p. 44).° It should 
be remembered, however, that he asked for . 
indications of satisfaction during a single ses- 


5 Leavitt’s wheel is called “the star” in the present 
report 
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sion rather than between sessions as in the 
present experiment. Also, his groups were 
required to solve a different type of problem, 
a variable which has been shown to affect 
ratings of satisfaction (11). 

Emergence of leadership. A leader is said 
to have emerged in a group if three or more 
of the four Ss named the same person in re- 
sponse to the question, “Did your group have 
a leader? If so, who?” According to this 
criterion, a leader emerged in only two groups 
in the comcon and in the slash, whereas a 
leader emerged in all groups in the star. The 
difference between the star and the other two 
nets is statistically reliable (p< .05) and 
agrees with previous findings. 

Although the comcon and the slash did not 
differ in this respect, the reasons why a leader 
did not emerge are probably different for the 
two nets. Spontaneous explanatory com- 
ments written in on the questionnaire by Ss, 
as well as spontaneous comments to £, indi- 
cated that a leader did not emerge in the 
slash primarily because of the conflict be- 
tween the two persons occupying Positions A 
and C, whereas in the comcon it was due to 
a feeling of equality among the group mem- 
bers. This interpretation is further bolstered 
by the fact that in four of the eight groups 
in the slash at least one person (eight Ss al- 
together) named two persons as the leader— 
those in Positions A and C. Only one person 
in the comcon named more than one person 
as the leader, and none in the star did so. 
This finding fits in rather well with the find- 
ings in regard to organizational development 
to be presented in the next section. 

We were also interested in the frequency 
with which Ss in the various positions would 
be named the leader as a function of the In- 
dependence score of that position. This 
“recognition of leadership’ was found to be 
highly correlated with Independence scores 
(r = .968, p < .01), in agreement with ex- 
pectation and with previous research. 

Organization. The pattern of organization 
which developed in the various groups was 
determined by analysis of the contents and 
distribution of the messages transmitted dur- 
ing the solution of problems and by the re- 
sponses of Ss to the question, “Did your 
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Table 1 


Frequency of Occurrence of Organizational Patterns 
in Each of the Three Nets 


Organizational Patterns 


Each- 

to-All Central 

plus plus No 
Check Central Check Pattern 


Each- 

Nets to-All 
Star 0 0 5 
Slash 2 0 1 
Comcon 2 3 3 


group develop a system? If so, describe it 
briefly.” Altogether, four types of organiza- 
tional patterns were distinguished: (a) Each- 
to-all, in which all information was _ trans- 
mitted to all Ss and then each S solved the 
problem independently; (4) each-to-all plus 
check, which was the same as (a) except that 
answers were passed to other Ss for checking 
before being accepted; (c) central, in which 
all information was sent to one person who 
solved the problem and sent the answer to 
other Ss who merely accepted it, and (d) 
central plus check, which was the same as 
(c) except that the answer was checked by 
at least one other S in the group. A fifth 
category, labeled “No pattern,’ includes all 
of those cases where no recognizable pattern 
emerged; e.g., groups in which all Ss said 
they did not have a system and in which 
none could be discerned in the pattern of 
message transmission. The results of this 
analysis are given in Table 1. 

Nets differed significantly (p < .01) in fre- 
quency with which these patterns emerged. 
The comcon groups showed predominantly 
“each-to-all” organization, whereas all star 
groups developed the “central” type, with 
the checking procedure being initiated by 
about half of the groups in each net. Most 
of the slash groups fell into the “no pat- 
tern” classification. Characteristically, mes- 
sages were sent at random until someone 
happened to get all of the information, or 
each S§ sent all of his original information to 
all other Ss with whom he was connected 
directly without relaying to less fortunate 
group members when necessary. In the lat- 
ter case, one or the other of the two pe- 
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ripheral persons found that he did not have 
enough information to solve the problem. 
This fits in with the conflict between the two 
potential leaders mentioned above. Also, the 
fact that the slash was slower and sent more 
message units than had been expected follows 
from this lack of organization. 


Discussion 


In discussing the results presented above, 
we shall be interested in two main questions: 
(a) changes in group behavior in the various 
nets as a function of experience, and (0) the 
extent to which the present results agree or 
disagree with the results of previous, short- 
term experiments. 

Groups solved problems faster, sent fewer 
messages, and became better satisfied with 
the task as a function of experience in the 
experimental situation. These findings hold 
for all nets and apparently were due to Ss 
learning how to operate in the previously un- 
familiar situation. That is to say, Ss learned 
to send only relevant information and, ex- 
cept in the case of the slash, to use some sys- 
tem for routing information. Performance 


thus became more in line with expectations 


and satisfaction increased accordingly. This 
interpretation assumes that college students 
expect to do very well on simple tasks of the 
sort used in this experiment, and that satis- 
faction is a function of the degree of discrep- 
ancy between expectation and actual accom- 
plishment. This latter motion, of course, is 
essentially that suggested by Freud (4, p. 16) 
and more recently by McClelland et al. (8). 

Looking at Figs. 2 and 4 one might sus- 
pect that Ss were rating satisfaction on the 
basis of length of time in the experimental 
situation. This hypothesis was rejected be- 
cause individual time scores and ratings 
failed to correlate significantly (r = — .047). 
Likewise, total time per session and average 
rating per group correlated only — .003. 

We turn now to the second question. Previ- 
ous experimental results indicated that the 
comcon should be faster, transmit more mes- 
sage units, and be more satisfying than the 
slash, which in turn should be faster, trans- 
mit more message units, and be more satis- 
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fying than the star. Actually, these rela- 
tionships were found only for the ratings of 
satisfaction. The comcon was faster than 
either the star or the slash, whereas these 
latter two did not differ significantly in this 
respect. Likewise, the star transmitted fewer 
message units than did either the comcon or 
the slash, but the slash sent more messages 
than did the comcon. In other words, the 
slash required both more time and messages 
than had been expected. 

The reasons for these discrepancies are 
probably very complex, but it seéms to us 
that they are due largely to the failure of the 
slash groups to develop any effective organi- 
zational pattern. As we have _ indicated 
previously, there were reasonably clear evi- 
dences of conflict between the two logically 
possible leaders in the slash groups. This 
conflict apparently prevented effective’ or- 
ganization which in turn resulted in erratic 
message transmission and slower problem 
solution. (Also, the fact that either Position 
A or Position C could be the mediator be- 
tween Positions B and D probably led to 
some confusion as to who would perform this 
function; consequently, Positions B and D 
were sometimes left without enough informa- 
tion to solve the problem themselves and no 
one sent them the answer.) The rigid struc- 
ture of the star suggested only one effective 
organizational pattern—the central pattern, 
and the complete lack of structure in the 
comcon indicated a need for organization of 
some type (or perhaps a lack of need for or- 
ganization since even if Ss merely sent out 
all of their original information over all 
available channels, the each-to-all pattern 
would result and the task would be effec- 
tively completed), whereas the structure of 
the slash suggested at least two possible or- 
ganizational patterns with no evident means 
of discriminating between them. 

Differences among positions within nets 
were in all cases in the direction expected 
from the results of previous research, and in 
most cases the differences were statistically 
reliable. The relationshid between Independ- 
ence and behavioral measures agreed with 
previous results for ratings of satisfaction, 
recognition of leadership, and time scores, 
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but did not agree for message units trans- 
mitted. 

Previous explanations in terms of freedom 
of action (6, 9) and saturation (5, 12) ap- 
pear adequate to account for these results. 


Summary 

This experiment studied the effects of cer- 
tain communication nets upon group behav- 
ior when groups were required to operate in 
the same net over a period of several days. 
The communication nets were the comcon, 
the slash, and the star. Eight groups of four 
Ss each were assigned to each of the three 
nets. Each group solved two simple arith- 
metic-type problems each day for a period of 
ten days. Sessions were scheduled at approxi- 
mately the same hour each day, and were 
scheduled on successive days except that 
Saturdays and Sundays were excluded. At 
the end of the experiment, Ss filled out ques- 
tionnaires which asked (a) for ratings of 
satisfaction with job in the group both on an 
over-all basis and on a day-to-day basis, (0) 
whether the group had a leader, and (c) 
whether the group developed a system. 

The results were as follows: (a) All groups 
solved problems faster, sent fewer messages, 
and rated satisfaction higher as a function of 
sessions in the net. (0) As expected, the 
comcon groups rated satisfaction higher than 
the slash groups, who rated satisfaction higher 
than did the star groups. (c) The comcon 
was faster but sent more messages than did 
the star, again as expected, but the slash was 
slower and sent more messages than did 
either the comcon or the star (although dif- 
ferences between the star and the slash with 
respect to time scores were not statistically 
reliable). (d) A leader emerged more fre- 
quently in the star than in either the comcon 
or the slash. (e) The comcon groups de- 
veloped predominantly “each-to-all” organi- 
zation, and the star developed predominantly 
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“central” organization, but the slash appeared 
to be almost completely disorganized. 


Received December 27, 1955. 
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A Null-Point Discontinuous Electrical Pursuit Meter 


R. J. Shephard 


RAF Institute of Aviation Medicine, Farnborough, Hants, England 


Electrical pursuit meters were used fairly 
extensively during the recent war (5, 6, 2, 
8). Theoretically this type of apparatus has 
a number of attractions—the tasks can be 
varied, interesting, and fairly closely related 
to the problem of flying an airplane, frequent 
readings can be obtained and scoring can 
be highly objective. In practice, it did not 
prove a success (7). Several days of train- 
ing were sometimes necessary to attain a con- 
stant performance, and difficulties were; en- 
countered with circuit variables under the 
rigorous conditions of heat, humidity, and re- 
duced pressure demanded by aviation physi- 
ology. 

The apparatus now to be described is an 
electrical pursuit meter of discontinuous type. 
The design is based on a Wheatstone bridge 
network, as suggested by Vere.‘ It is simple 
in construction, sturdy in structure, and easy 
to operate. Further, since it embodies a null- 
balance principle, difficulties due to circuit 
variables are largely eliminated. It is ca- 
pable of examining a number of parameters 
of psychomotor performance simultaneously, 
and can be readily adapted to meet a variety 
of test situations. These favorable features 
suggest there may yet be a place in psy- 
chomotor research for electrical pursuit me- 
ters of this type; accordingly a careful evalu- 
ation of the present machine has been made 
under normal resting conditions and during 
the stress situation of high pressure breathing. 


Methods 


The Apparatus. The subject (S) is confronted 
by a plain panel set at an angle of about 15° to 
the vertical (Fig. 1). Mounted in the upper part 
of the panel are two 0-20 voltmeters, V: and Vs2, 
separated by a distance of one foot. (This separa- 
tion prevents accurate simultaneous reading of the 
two dials.) The operator can produce a variety of 
readings of Vi, and S is required to produce corre- 
sponding readings of V2 by operation of a small, 
centrally placed control knob. 


1 Personal communication, 1955. 


The circuit is based on the familiar Wheatstone 
bridge (Fig. 2).2 One arm of the bridge carries a 
uniselector allowing connection to nine different re- 
sistors. The voltage across this is recorded by V:. 
A second arm of the bridge carries a rheostat, Rs. 
This is operated by S, the voltage across it being re- 
corded by a second (matched) voltmeter, V2. The 
full movement of the rheostat is 270°, and with the 
smaljl control knob several wrist movements are 
needed to cover this range. At one end of the scale 
a large voltage change is produced with a small 
amount of rotation, while at the other end of the 
scale a comparatively small change is produced by 
a large movement of the control knob. This allows 
the introduction of tasks of graded difficulty. 

The remaining two arms of the bridge consist 
simply of matched resistors. Thus the current flow- 
ing through the galvanometer, G, is proportional to 
the difference in voltmeter readings Vi— Ve. When 
S has completed his task (that is to say, Vi = Vz) 
no current flows through the galvanometer; the final 
end point is therefore independent of variations in 
resistor values. It is important to obtain a galva- 
nometer of high sensitivity in order to make the two 
halves of the bridge virtually independent; with the 
present system, no detectable change in the reading 
of V; is noted with adjustment of V2. 


Design of the Experiment 


Test procedure. The nine resistors attached to the 
uniselector give a possible 80 tasks, but for the pres- 
ent purpose 16 were considered sufficient. These 
were presented at the rate of one every four seconds. 
They varied in severity according to the size of the 
initial stimulus (movement of V;), the rotation re- 
quired, and the sensitivity of the rheostat in the re- 
gion of final adjustment. 

Subjects. The Ss were drawn from the medical 
laboratory staff. Six adults (5 male, 1 female) took 
part in the experiments under normal resting con- 
ditions. They were seated with the panel dials at 
approximately eye level, and the control knob at a 
comfortable distance frem the body chosen by the 
subject. Before the first test they were each shown 
a typical tracing, and it was explained that the ap- 
paratus would measure both the speed and the ac- 
curacy with which they performed the various tasks. 
Each S attended at six different times on consecu- 
tive days, and on each occasion operated the ma- 
chine for a little over 5 minutes. 

Nine male Ss took part in the pressure-breathing 
experiments, five attending on more than one oc- 


2 Details of the circuit and apparatus may be ob- 
tained from the author on request. 
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casion. The routine for each visit consisted of a 
3-min. control run, 2-min. run during venous con- 
gestion (cuff inflated to 78 mm.Hg. around right 
arm), 2-min. run wearing pressure-breathing helmet, 
4-min. pressure breathing (sometimes shortened ow- 
ing to subjective distress), and final 2-min. control 
run. 


Results 


Factors Affecting Performance Under Normal 
Resting Conditions 


Analysis of the tracing. 
have been measured for each individual task. 
These are: 


Three parameters 


1. Time taken by S to initiate movement 
of the rheostat (“initial response time’’). 

2. Total time elapsing before completion of 
the task (“total response time’’). 

3. Final error of matching accepted by the 
subject. 


For each S 480 values have been obtained 
for each parameter. Performance has been 
analyzed with particular reference to the ef- 
fects of experience, time of day, and intertask 
variation. 

Time of day. Summing performance over 
individual days, a two-way classification of 
Ss against time of testing can be made. A 
simple analysis of variance for data classified 
in this way (6) shows there is a highly sig- 
nificant component (P < .001) attributable 
to intersubject differences, but there is no sig- 
nificant variance attributable to the time of 
day when the observations were made. 


VOLTMETER V2 


External appearance of pursuit meter. 


Experience. Since time of testing is not 
important, it is permissible to rearrange the 
data to yield a two-way classification of Ss 
against day of testing. Analyzing the vari- 
ance as before, there is no significant com- 
ponent attributable to “days” except in the 
case of.“error.”’ Variations of error occurred 
mainly on two days, Day 4 showing a lower 
accuracy and Day 5 a higher accuracy than 
the other four days. On any one day, there 
were only slight fluctuations of performance 
from minute to minute. The absence of any 
systematic learning effect is rather surprising, 
and may be due to the fact that the Ss were 
research workers experienced in the match- 
ing of dials on electronic apparatus. 

Intertask variation. A two-way classifica- 
tion of Ss against tasks can be made by sum- 
ming the data over days. An analysis of 
variance for data classified in this way is pre- 
sented in Table 1, and it can be seen that for 
each of the three parameters examined there 
is a significant component of the total vari- 
ance attributable to intertask differences. 

Further analysis of intertask differences is 
made possible by calculating the over-all mean 
value for each task, and arranging these 
values in order of magnitude (Table 2). The 
tasks are all of similar nature, and it is there- 
fore reasonable to assume that the error vari- 
ance of Table 1 is normally distributed be- 
tween the different tasks. Calculating critical 
difference levels on this assumption, it is pos- 
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Table 1 


Variance of Test Scores Under Resting Conditions. 
Two-Way Classification of Data 


(Ss against performance with individual tasks) 


Mean Variance 


Measure Square Ratio 


P 


Initial response 

time 
6.601 
0.362 
0.689 
7.652 


1.320 
0.0241 
0.0092 


Due to Ss 
Due to tasks 
Due to ervor 
Total 


143.6 
2.62 


Total response 
time 
Due to Ss 
Due to tasks 


11.76 
48.57 
13.68 
74.01 


<.001 
<.001 
Due to error 
Total 


Error 

0.0115 
0.2148 
0.0250 
0.2514 


0.0023 
0.0143 
0.00033 


6.7 
43.3 


<.001 
<.001 


Due to Ss 
Due to tasks 
Due to error 
Total 


+ 
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sible to assess the significance of individual 
intertask differences at different levels of 
probability. The three parameters of psy- 
chomotor function will be considered sepa- 
rately. 

1. Initial response time. The mean value 
for initial response time is .981 units (.77 
sec.). Most of the 16 tasks show initial re- 
sponse times that are distributed fairly closely 
about this mean value, but four tasks (6, 11, 
10, and 8) yield values that are significantly 
greater than the remainder. The quantity 
that is being measured is by no means a 
“simple” reaction time, being at least four 
times greater than the time required to re- 
spond to simple visual signals (4, 9), and 
there are probably several factors contribut- 
ing to intertask differences. These would in- 
clude the pattern of the initial visual stimu- 
lus (in Tasks 4, 13, 6, and 11, the needle 
swings through a comparatively small angle), 
and distractions caused by failure to complete 
the previous task (particularly with Tasks 
10, 8, 2, and 16). 


28 VOLT 
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Circuit diagram for pursuit meter. 
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Table 2 


Intertask Differences. Mean Values for Individual 
Tasks Arranged in Descending Order of 
Performance 





Total 
Response 


Task Time Task Error 


2.54 units 11 0.037 cm. 
2.69 10 0.039 
2.84 16 0.039 
2.95 6 0.040 
2.97 0.044 
2.98 0.045 
3.01 0.047 
3.14 0.048 
3.19 0.055 
3.30 0.055 
3.53 0.058 
3.58 0.061 
4.43 0.124 
4.58 0.142 
4.66 0.157 
4.73 0.179 


Initial 
Response 
Task Time 
12 0.895 units 
9 0.921 
5 0.925 
14 0.925 
7 0.930 
15 0.940 
3 0.950 
1 0.953 
16 0.984 
4 0.984 
2 0.992 
13. 1.012 
8 1.017 
10 1.027 
11 1.102 
6 1.140 


nmruw 
_ 


— — 
NU DOwWwWNHHD SS SUI 


— — 
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=) 
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?p Critical Difference Levels for Above Data 
.05 0.124 units 0.024 cm. 
01 0.176 units 0.033 cm. 


0.55 units 
0.78 units 


2. Total response time. The mean value 
for total response time is 3.45 units (2.7 
sec.). Values for individual tasks are dis- 
tributed quite widely about this mean value, 
and many differ significantly from each other. 
The most important factor governing the 
time taken over any one task is the sensi- 
tivity in the region of final adjustment, Tasks 
1, 7, 9, and 15 being uniformly difficult in 
this respect. However, where the angle of 
rotation is greater than can conveniently be 
achieved with one wrist movement, this also 
assumes some significance, as in Tasks 8 and 
9 (270°), 12 and 5 (120°). 

3. Error. The mean difference between the 
two voltmeter readings accepted by the six 
Ss was .365 volts. Much of this error is at- 
tributable to four tasks (9, 7, 15, and 1), 
where the extreme sensitivity in the region of 
final adjustment sometimes prevents comple- 
tion of the task within the permitted four 
seconds. Discounting these four tasks, there 
are no significant differences between the re- 
maining 12 tasks. The average accuracy for 


the 12 tasks is .22 v.—this corresponds to 
approximately one-fifth of a scale division, 
and must be close to the limit of achievement 
with a simple voltmeter scale. Thus it would 
seem that under normal conditions Ss persist 
with each task until matching is achieved to 
be best of their visual ability. 

Reliability of the data. A formal analysis 
of reliability may be obtained by calculating 
the odd-even correlation coefficient for suc- 
cessive 5-min. periods of testing. The values 
for initial response time (r = .89) and total 
response time (7 = .85) reach the level re- 
quired for a satisfactory test (3), comparing 
well with values obtained for other pursuit 
meters (2). The coefficient for error is low 
(r= .35). This is partly due to difficulty 
in measuring, since this component sometimes 
amounts to less than .1 mm. on the galva- 
nometer tracing. Inaccuracies in the error 
measurement are not normally important if 
Ss persist with each task to the limits of 
visual ability. However, if it is specifically 
desired to measure error, the galvanometer 
deflection for a given voltage imbalance may 
be increased. 

Change of Performance Pressure 
Breathing 


During 


The stress situation of a high breathing 
pressure was chosen partly on account of the 
importance of this maneuver in present-day 
aviation physiology, and partly because little 
previous work had been done in this field. 
Two relevant papers (1, 7) have shown some 
decrement of performance using pressure- 
breathing equipment at an altitude of 47,000 
feet. 

The present observations were made at 
ground level (to avoid possible changes due 
to coincident hypoxia), trunk counterpressure 
was provided, and much higher breathing 
pressures were used. For the first 10 experi- 
ments, a pressure of 78 mm. Hg was main- 
tained for 4 min. Changes in performance 
have been expressed as a percentage of con- 
trol values (Table 3). A similar change was 
observed for each of the four minutes. The 
initial response time (A) and error were in- 
creased, while the total response time (B) 
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Table 3 


Changes in Psychomotor Performance Produced by Breathing at Pressure of 78 mm. Hg 
(10 experiments) 


Mean 
Value 
{ oO 


Zo) 


First minute 
Initial response time 


.02-.05 
Total response time 


.001-.01 


Error 


B-A 


Second minute 


Initial response time 


20-.10 
01 


Total response time 
Error 


B-A 


Third and fourth minutes 
(7 experiments) 
Initial response time 3.8 
4.1 
+ 40.7 


Total response time = 
Error 


and the time occupied by muscular move- 
ment (B-A) tended to a slight decrease. 

An additional five observations were made 
at a breathing pressure 109 mm. Hg 


at 78 mm. Hg gives an even smaller error. 
While most of the Ss had previous experi- 
ence of pressure breathing, none had previ- 


of ously endeavored to carry out a skilled task 


(Table 4). The error was consistently less 
than at the lower breathing pressure. This 
probably represents a reaction to experience 
of the test under conditions of pressure 
breathing; in support of this view it will be 
noted that in subject R. J. S. a further test 


during the period of pressurization, and it 
would seem that practice is required to 
achieve a good score under these conditions. 
The relevance of this observation to the in- 
doctrination of aircrew needs no further em- 
phasis. 


Table 4 


Comparison of Changes in Performance at Breathing Pressures of (a) 78 mm. Hg, (6) 109 mm. Hg. 
Percentage of Control Values 


Initial Response 


Total Response 
Time Time Error B-A 


78mm. 109mm. 
Hg Hg 


109 mm. 
Hg 


78mm. 109mm. 
Hg Hg 


78mm. 109 mm. 


78 mm. 
Hg Hg 


Subject Hg 
R. J.S. 
(1) 
(2) 
j.. E. 
D. P. 
I. H. 


—11.4 

9.7 
+22.1 
+ 8.2 
+-20.6 


—14.1 
—12.0 
+23.6 
+26.2 
+28.2 


—1.8 —6.1 
—0.8 —1.5 
—3.4 —2.8 
—1.7 +4.1 
—3.4 


+-100.7 
+ 21.5 
+ 15.4 


+97.4 
+ 5.3 
— 28.1 
+ 4.6 


+ 3.0 — 28 
+18 + 3.0 
—10.2 —12.2 
—- 53 — 43 
—-7838 —2.4 
Mean value for 
5 experiments + 6.0 


+10.4 —2.2 
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An attempt has been made to define the 
factors underlying the changes of perform- 
ance. The pressure-breathing situation—en- 
closure of the head in a rather hot helmet, 
some increase of respiratory effort, and the 
dull pain of extreme peripheral venous con- 
gestion—tends to produce the reactions of 
panic. There is some increase of muscle-ac- 
tion potentials during pressure breathing,* 
and it seems reasonable to suggest that 
muscular tension is increased, thus helping 
to produce a faster muscular response. A 
further factor governing both total response 
time and error is the degree of perseverance 
shown by S; experience here helps S to per- 
sist with his task in the face of discomfort 
and a tendency to panic. Some of the in- 
crease of initial response time may be at- 
tributable to central factors, but part at least 
is due to a restriction of lateral movement of 
the head during pressurization. This can be 
overcome by practice and determination, as 
may be seen from the more normal initial re- 
sponse times observed during successive min- 
utes of pressure breathing. 

It is difficult to reproduce the rapid venous 
distension that occurs in all unpressurized 
areas of the body, but some of the changes 
occurring in the arm can be simulated by ap- 
plying a sphygmomanometer cuff. Thirteen 
experiments with the cuff inflated to 78 mm. 
Hg showed a significant fall in the initial re- 
sponse time, and no change in the other pa- 
rameters of performance (Table 5). The 
fall of initial response time may be attributed 


Table 5 
Changes in Psychometric Performance Produced by 
Sphygomanometer Cuff Inflated to 78 mm. Hg 
(cuff applied to r. upper arm) 
(Results expressed as % of control value; 
13 experiments.) 





Mean 
Value SE 

(%) (%) t Pr 
2.75 02-.01 
0.32 — 
0.57 - 
0.48 


—5.5 
+0.7 
+4.4 
+1.0 


+2.0 
+1.7 
+6.0 
+2.1 


Initial response time 


Total response time 
Error 
B-A 


3R. J. Shephard, unpublished observations. 
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Table 6 
Changes in Psychometric Performance Produced by 
Wearing Pressure-Breathing Helmet 
(13 experiments) 


Mean 

Value SE 

(%) (%) t 
Initial response time + 4.5 
+ 2.7 
+16.9 
+ 1.7 


1.28 
2.70 
1.64 
0.95 


+ 3.5 
+ 1.0 
+10.3 
+ 1.8 


Total response time 
Error 
B-A 


tentatively to facilitation by the discomfort 
and pain of venous congestion arising from 
the same arm. It is probable that more gen- 
eralized venous congestion produces a similar 
effect; if so, limitation of neck movement 
during pressurization has even more influence 
on the initial response time than Table 4 
would suggest. 

Experiments with the helmet unpressurized 
indicate the effect of wearing the equipment 
alone (Table 6). Even in the unpressurized 
state, lateral movement of the head is less 
easy than it is normally, and this is probably 
responsible for the changes observed—a sig- 
nificant increase of the total response time, 
and some tendency to increase of error. 

Neither local venous congestion nor the 
wearing of pressure-breathing equipment ac- 
count for all the changes observed during 
pressure breathing; part of the performance 
decrement must be attributed to other fac- 
tors, including possibly a specific panic re- 
action to the pressurization or a decreased 
cerebral blood flow. 


Discussion 


Aviation psychology is concerned largely 
with performance under conditions of stress, 
and in an aviation laboratory the practical 
value of a psychometric test is often deter- 
mined by its ability to detect changes asso- 
ciated with specific stresses. The apparatus 
described above seems to possess sufficient 
sensitivity to show both an immediate “panic” 
response to the stress of pressure breathing, 
and a progressive improvement as the Ss be- 
come accustomed to performing a skilled task 
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in the pressurized state. It shows a number 
of other favorable features. Readings can be 
obtained every 4 sec., and are presented as 
permanent objective records, suitable for sta- 
tistical analysis. Under normal resting con- 
ditions, results for any one S show a satisfac- 
tory reliability, and there is little learning 
effect. The intersubject variation (often con- 
sidered an index of a good test) is quite large. 
Further, the external appearance of the ap- 
paratus bears some resemblance to the con- 
trol panel of an aircraft, and the adjustment 
required is at least as difficult as the tasks 
normally encountered in flying. The chief 
drawback is the time required for the analy- 
sis of the tracings—this could certainly be 
decreased by the use of a planimeter, and 
with adequate training the measurements 
might be made by a good technician. 

It is of some interest to consider the psy- 
chological functions that are being measured 


by the apparatus. The initial response time 


probably represents the time required to read 
the left-hand dial, memorize the reading, and 
initiate appropriate coordinated movements 
of the wrist, although some individuals (de- 
pending partly on personality) may initiate 


movements with inadequate memorization and 
refer to the left-hand dial again during the 
period of adjustment. The total response 
time represents the time required to com- 
plete a fairly simple coordinated task. It in- 
cludes in addition to the initial response time 
the period occupied by muscular movement 
and the time required for making a final 
judgment of accuracy. Sometimes S over- 
shoots the balance point by 1-2 v., and it is 
then possible to measure a further parameter 
—the reaction time for small corrective move- 
ments. In contrast to some preliminary ob- 
servations of Davis,‘ the time required to 
initiate such movements with the present ap- 
paratus (0.2—0.3 sec.) seems at least as great 
as the expected “simple” reaction time for a 
visual stimulus. The factor governing error 
seems normally to be the ability to read a 
simple voltmeter scale; under conditions of 
stress it is likely that the judgment and perse- 
verance of S also become involved. 


#R. Davis, personal communication, 1955. 


Electrical Pursuit Meter 293 

The apparatus is capable of modification 
to test other psychological functions. The 
adjustment available for the selective study 
of accuracy of performance has already been 
mentioned. For the testing of addition or 
subtraction, a galvanometer in the recording 
camera may be aligned at, for instance, 3 v. 
off balance; S is then required to add or sub- 
tract three from each of the left-hand dial 
readings. For code substitution an additional 
rheostat, R,, is inserted in series with the 
uniselector, and S is given a table showing 
the values of V. that are required to balance 
different readings of V,;. To test an S’s pow- 
ers of discrimination, he may be instructed 
not to respond to one reading (for exam- 
pie, V; = 16.4 v.). Finally, if interested in 
visual contrast discrimination, special volt- 
meter dials may be prepared having the fig- 
ures more clearly marked at one end of the 
scale than at the other. Other possible appli- 
cations could be described, but these exam- 
ples are sufficient to illustrate the versatility 
of the apparatus. 


Summary 


Description is given of a null-balance elec- 
trical pursuit meter based on a Wheatstone 
bridge circuit. Evaluation in a group of 
normal Ss shows that under resting condi- 
tions it yields repeatable measurements of an 
initial response time and a total response 
time for a coordinated manual task of the 
type encountered in flying an aircraft. Pos- 
sible applications include addition and sub- 
traction problems, code substitution, dis- 
crimination tests, and measurements of visual 
contrast discrimination. 

During the stress of high-pressure breath- 
ing, there is a significant increase of initial 
response time and error, while the total re- 
sponse time tends to be reduced. These 
changes cannot be reproduced by local venous 
congestion or the wearing of pressure breath- 
ing equipment alone, and it is suggested that 
they represent a panic reaction to the pres- 
surization. Training gives a marked improve- 
ment in the ability of all subjects to perform 
the task during the period of pressurization. 


Received November 7, 1955. 
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The Duration of Movement Components in a Repetitive 
Task as a Function of the Locus of a 
Perceptual Cue’ 


J. Richard Simon” 


University of Wisconsin 


This study is concerned with the general 
problem of the role of perceptual processes in 
human motion. The specific variable ma- 
nipulated is the locus of a perceptual cue 
within a repetitive patterned motion. A sim- 
ple assembly operation is chosen for study 
because of its obvious importance in the in- 
dustrial setting and because of the ease of 
defining and timing the parts or components 
of the motion cycle. 

For several years, electronic methods of 
motion analysis have been used to record the 
duration of the component movements in 
various motion cycles (5, 10, 12). Simon 
and Smader (10) extended this methodologi- 
cal approach to the problem of the role of 
perceptual processes in motion. They found 
that a specific visual discrimination imposed 
on part of a motion cycle had a generalized 
effect on the entire motion pattern. In that 
initial study no attempt was made to con- 
trol or vary the point in the work cycle at 
which the discrimination occurred. This 
problem has been instrumentally solved and 
the solution applied in the present study. 

By systematically introducing a perceptual 
cue into the various components of an other- 
wise unchanged pattern of motion, the role 
of the locus of the cue in defining the tem- 
poral relations within the motion cycle was 
determined. Information of this sort may 
be applied to the design of more efficient 
work operations. The evidence presented 
here has relevance also to the evaluation of 


1 This article is based on a dissertation submitted 
in partial fulfillment of the requirements for the de- 
gree of Doctor of Philosophy at the University of 


Wisconsin. Financial support of this research came 
from the National Science Foundation under a grant- 
in-aid for a project on perception and motion, di- 
rected by Karl U. Smith. The writer is indebted to 
Professor Smith for his guidance. 

2Now at the American Institute for 


Research, 
Pittsburgh, Penna. 


some of the basic time-and-motion study con- 
cepts. 


Method 
Apparatus 


Figure 1 is a sketch of the simplified assembly ar- 
rangement used. The assembly plate is 4 in. thick 
and 9 in. square. It is divided in half by a vertical 
line. Sixty-four holes, } in. in diameter and 1 in. 
apart have been machined in the plate, eight holes 
to a row. The parts-supply bin is 5 in. from the 
nearest row of holes in the assembly plate. The bin 
is 52 in. wide and 5} in. long. A curved piece of 
sheet steel forms the floor. Across the front opening 
of the bin is a thin metal bar. The S must reach 
over this bar in order to grasp a pin from the bin. 

The bin contains 80 precision made pins 2 in. in 
diameter and 1 in. long. The ends of the pins are 
slightly tapered. Forty pins are cadmium-plated, 
which gives them a silver color easily distinguish- 
able from the other 40 pins which are copperplated. 
A transparent colored barrier may be easily inserted 
across the front opening of the supply bin. With 
this barrier in place, the pins, though clearly visible, 
are indistinguishable with regard to color. It will 
be shown later how this control over the appearance 
of the pins is used in varying the locus of the per- 
ceptual cue. 

The electronic motion analyzer, pictured at the 
right in Fig. 1, provides separate and automatic 
measurement of the durations of the four compo- 
nents of the work cycle. The total time per trial 
for these components, viz., grasp, loaded travel, as- 
sembly, and unloaded travel, is recorded in hun- 
dredths of a second on four precision clocks. 

The analyzer, described previously (10, 11), con- 
sists of a four-channel electronic relay circuit actu- 
ated by a subthreshold current. The S acts as a 
key in the circuit, ie., the clocks are automatically 
energized by his movements in performing the task. 

A main part of the apparatus for purposes of this 
study is the cue-control system. It consists of the 
transparent barrier and two neon signal lights lo- 
cated between the bin and the assembly plate. By 
use of the signa! lights, the barrier, and proper in- 
structions, it is possible to systematically introduce 
a perceptual cue into any phase of the basic motion 
cycle and to determine its effect on the time re- 
quired for the four component movements of the 
cycle. 
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INSERT BIN WITH TRANSPARENT 
BARRIER IN PLACE BARRIER 
MAKES DIFFERENT COLORED 
PINS INDIST INGUISHABLE 














ASSEMBLY BOARD  -———. 


Fic. 1. Sketch of simplified assembly arrangement. 


Procedure and Experimental Design 


The independent variable was the locus of a per- 
ceptual cue within a pattern of motion. The de- 
pendent variables were the durations of the four 
component movements comprising the motion cycle. 
The S’s task consisted of inserting metal pins into 
holes in an assembly plate. A trial was defined as 
the transporting of 40 pins, one at a time, from the 
parts supply bin to the work area and inserting each 
one in a hole in the plate. This basic repetitive op- 
eration was held constant throughout all the experi- 
mental conditions. 

Six variations of the basic task made up the six 
experimental conditions. The first two variations 
were control conditions. The last four consisted of 
systematically varying the point within the work 
cycle at which a perceptual cue was introduced. 

1. Control—no discrimination. This variation in- 
volved no perceptual cues and therefore served as a 
basis for comparison with Conditions 4 through 6 
below. The S was simply instructed to assemble 
the 40 pins as rapidly as possible, disregarding their 
color. 

2. Control for barrier—no discrimination. This 
condition was the same as Condition 1 except that 
the transparent barrier was inserted in front of the 
bin. The barrier, used to place the perceptual cue 
in the loaded-travel component of the cycle, may 
itself have affected the durations of any or all of the 
component movements in the cycle. Therefore this 
condition was included to serve as the control for 
Condition 3. 

3. Cue in loaded travel. The transparent barrier 
was inserted in front of the bin. The S was in- 
structed to pick up one pin at a time, inserting the 
copper pins in the left side of the plate and the 
silver pins in the right side. Since the barrier made 
the pins indistinguishable with regard to color, S 
was not able to make the necessary discrimination 
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until he had grasped a pin and started to carry it to 
the assembly area, i.e., he made the discrimination in 
the loaded-travel part of the cycle. 

4. Cue in unloaded travel. The S was instructed 
to alternate, first picking up a copper pin and then 
a silver pin, until he had 40 pins inserted in the 
plate. Here the discrimination must of m cessity 
take place before S can grasp the correct colored 
pin, ie., in the unloaded-travel part of the work 
cycle. 

5. Cue in grasp. One or the other of two signal 
lights automatically flashed as S grasped a pin from 
the bin. Since S’s contact with the bin activated 
the light and since the light went off as soon as 
grasping the pin was complete, the perceptual cue 
was available during and only during the grasp com- 
ponent of the cycle. If the left light went on, S 
placed the pin he had grasped in the left side of the 
plate. If the right light went on, he placed the pin 
in the right side of the plate. Color of the pins was 
disregarded. Placing the pin in a hole automatically 
advanced a stepping relay which controlled the se 
quence of lights so that upon grasping the next pin, 
a new signal was presented. 

6. Cue in assembly. The same two signal lights 
were used, only this time S’s contact with the as- 
sembly plate presented a signal light and breaking 
contact with the plate turned off the light. The 
light signaled where the next pin was to be placed, 
ie., as S placed each pin in the plate, he received the 
cue necessary to perform the following motion cycle 
Grasping a pin from the bin automatically advanced 
the stepping relay so that upen assembling this pin, 
a new signal was presented. 

The method of advancing the stepping relay, as 
just described, eliminated the complicating factor of 
lag or reaction time of the relay. That is to say, 
the signal light was presented simultaneously with 
completion of the circuit by S, since the relay al- 
ready had been advanced into position by the previ- 
ous work movement. The stepping relay generated 
a random sequence of light signals which changed 
from trial to trial. 

Thirty right-handed college students were used as 
Ss. Each S performed under all six variations of 
the task. A latin-square design was used to control 
individual differences and order of presentation of 
the experimental conditions. Each S was assigned 
to one of 30 sequences of conditions provided by 
five independently drawn 6 X 6 latin squares, and he 
performed only in his assigned sequence of condi- 
tions for the duration of the experiment. The Ss 
were run in groups of six per week. Each group of 
Ss completed one of the independent latin squares. 

The Ss were tested for five consecutive days. On 
Days 1, 2, and 3, S ran through each experimental 
condition twice, making a total of 12 trials. On 
Days 4 and 5 each experimental condition was re- 
peated three times for a total of 18 trials. The ad- 
ditional trials on the last two days were added in 
an attempt to obtain more reliable measures of per- 
formance. 





Duration of Movement Components 


Results 


The performance of 30 Ss on Day 5 was 
analyzed to determine the effects of the ex- 
perimental conditions on the component 
movements of the work cycle. A median 
score for each component of the motion un- 
der each experimental condition was deter- 
mined. 

Four separate analyses of variance were 
performed,*® one for each of the component 
movements. These analyses‘ brought out 
the fact that the experimental conditions pro- 
duced significant (p < .01) variations in the 
durations of all four component movements 
of the work cycle, viz., grasp, loaded travel, 
assembly, and unloaded travel. 

Let us now consider separately the four 
components of the task beginning with loaded 
travel. Figure 2 is a bar graph of the mean 
duration of the loaded-travel part of the 
cycle under the six experimental conditions. 
Since the experimental conditions produced 
signijicant variation in the duration of this 
part of the work movement, the crucial step 
was a comparison between the conditions in- 
volving perceptual cues and the control con- 
ditions in which no specific cues occurred. 
By using as a base the duration of loaded 
travel in the control condition, we were able 
to determine how the loaded-travel compo- 
nent was affected by locating the perceptual 
cue in the four different parts of the cycle. 
To accomplish this comparison a Duncan 
Range Test (4) was performed. The Duncan 
Test determines the significance of differences 
between ranked treatment means in an analy- 
sis of variance. Essentially, it indicates the 
number of significant gaps (at the 5% level) 
between the ranked means. 


3 Before the analyses of variance were computed, 
the data were subjected to the Bartlett chi-square 
test for homogeneity of variance. The data satis- 
fied the assumption of random sampling from popu- 
lations with a common variance. 

4+ Summaries of the analyses of variance and other 
statistical tests referred to in this report have been 
deposited with the American Documentation Insti- 
tute. Order Document No. 4973 from ADI Aux- 
iliary Publications Project, Photoduplication Service, 
Library of Congress, Washington 25, D. C., remit- 
ting in advance $1.25 for microfilm or $1.25 for 
photocopies. Make checks payable to Chief, Photo- 
duplication Service, Library of Congress. 
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In Fig. 2 the means are ranked from low 
to high. Results of the Duncan Test are in- 
dicated by the brackets. Note that the dura- 
tion of the loaded-travel movement is great- 
est when the perceptual cue is located in 
loaded travel. The Duncan Test indicated 
that this mean was significantly different 
from all the means ranked below it. The 
next longest duration of the loaded-travel 
movement occurred when the perceptual cue 
was placed in the grasp component of the 
cycle. This mean, too, differs significantly 
from all the others. There are no gaps be- 
tween the remaining four means indicating 
that these experimental conditions do not 
produce significant variations in the duration 
of loaded travel. 

To summarize the results presented in Fig. 
2, a cue placed in the loaded-travel or grasp 
components significantly increased the dura- 
tion of loaded travel over the time required 
for the same motion in a control condition in- 
volving no perceptual cues. However, a per- 
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Fic. 2. Duration of loaded-travel component un- 
der the six experimental conditions. Brackets indi- 
cate significant gaps between ranked treatment 
means. 
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Fic. 3. Duration of unloaded-travel component 
under the six experimental conditions. Brackets in- 
dicate significant gaps between ranked treatment 
means. 
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ceptual cue placed in the assembly or un- 
loaded-travel components did not significantly 
affect the duration of loaded travel. 

Figures 3, 4, and 5 present the remaining 
three parts of the work cycle in a similar 
fashion. Figure 3 shows the means of the 
unloaded-travel part of the work cycle. It 
can be noted that a cue placed in unloaded 
travel or loaded travel significantly increased 
the duration of the unloaded-travel motion 
over the time required for the same motion 
under a control condition involving no per- 
ceptual cue. However, a perceptual cue 
placed in the assembly or grasp components 
of the work cycle did not significantly affect 
the duration of unloaded travel. 

Figure 4 pictures the mean duration of the 
grasp component under the six experimental 
conditions. Placing a perceptual cue in any 
of the components of the task, viz., assembly, 
grasp, loaded travel, or unloaded travel, had 
the effect of increasing the duration of the 
grasp component over the time required for 
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the same motion under a control condition 
involving no perceptual cue. There was also 
a significant gap between the two control 
conditions. This difference means that the 
presence of the transparent barrier per se in 
some way increased the duration of the grasp 
component. 

Figure 5 indicates that the time required 
for the assembly component was significantly 
increased only by placing the cue in the as- 
sembly part of the cycle. 

The principal findings of the present study 
are summarized in Table 1. Results of the 
four separate Duncan Tests are integrated in 
order to show at a glance the effects of any 
specific cue locus on the durations of all four 
component movements in the work cycle. In 
all cases the time required for a component 
movement is compared with the duration of 
the same movement under the appropriate 
control condition in which no perceptual cue 
was involved. For example, in line 1 of 
Table 1, when unloaded travel was the locus 
of the perceptual cue, we find a significant 
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Fic. 4. Duration of grasp component under the 
six experimental conditions. Brackets indicate sig- 
nificant gaps between ranked treatment means. 
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increase in the duration of the unloaded- 
travel component. Duration of the grasp 
component of the cycle also increased sig- 
nificantly. The times for loaded travel and 
assembly were not significantly altered. 

Effects of practice on the component move- 
ments. Differences noted between the ex- 
perimental conditions on Day 5 appeared 
consistently over the first four days of prac- 
tice as well. The average decrease in the 
duration of the grasp component from Day 1 
to Day 5 was 23%. The assembly compo- 
nent decreased 8% over the same period. 
The durations of loaded travel and unloaded 
travel both decreased 14% 

A test was made of the hypothesis that 
there is no difference in the amount of learn- 
ing which takes place in a component involv- 
ing a perceptual cue and in the same compo- 
nent when no perceptual cue is involved. In 
all cases we were able to reject the null hy- 
pothesis (p< .05). Thus, it appears that 
more learning occurs in a component when it 
is perceptually loaded than when it is not. 
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Fic. 5. Duration of assembly component under 
the six experimental conditions. Brackets indicate 
significant gaps between ranked treatment means. 
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Table 1 


Effects of Cue Locus on the Durations of the Four 
Component Movements in the Work Cycle 


Movement C omponent 
Locus of the — — 
Perceptual Unloaded Loaded 
Cue travel 


Assem- 
Grasp 


travel bly 


Unloaded travel 0 0 
Grasp Sal 0 
Loaded travel oa 0 

0 + 


Assembly 





Note.— + indicates component time significantly increased ; 
0 indicates component time not significantly altered. 


Reliability of measures. Median scores 
from Day 4 for each movement component 
under each experimental condition were cor- 
related with the comparable measure from 
Day 5. In general, all four components of 
the task showed a high level of consistency 
from day to day. The correlation coefficients 
were of the order of + 0.80 to + 0.90. 


Discussion 


Two broad generalizations are suggested. 
First of all, it is apparent that a perceptually 
loaded component takes significantly more 
time than its counterpart which involves less 
perceptual load. We reached this conclusion 
by comparing the duration of any component 
when it was the locus of a perceptual cue 
with the duration of the same component 
under a control condition where no cues were 
involved. This result held true regardless of 
whether the cue was placed in unloaded 
travel, grasp, loaded travel, or assembly. Re- 
cently, Seymour (9) reported a study in 
which he varied the perceptual load of a 
constant length movement and found that 
duration of the movement increased as the 
perceptual requirements increased. As far as 
it is known, the present study is the first in 
which the perceptual load of the manipula- 
tive components of a task, i.e., assembly and 
grasp, have been changed without actually 
altering the character and complexity of the 
movement. 

A second generalization perhaps is of greater 
importance. Regardless of the point in the 
work cycle at which the cue was placed, the 
duration of at least one other component be- 
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sides the one containing the perceptual cue 
was significantly increased. For example, 
when the locus of the cue was unloaded 
travel or grasp, the duration of the immedi- 
ately adjacent movement component was sig- 
nificantly increased. It might appear that 
the information presented during the previ- 
ous component was organized or processed 
centrally and that this somehow increased 
the duration of the movement. However, 
when the locus of the cue was loaded travel 
or assembly, there appeared to be little con- 
nection between where the cue was presented 
and acted upon and which specific parts of 
the movement were affected. It is these spe- 
cific results that could not have been pre- 
dicted from existing evidence about the na- 
ture of high-speed performance (13). 
Explanation of the far-reaching effects of 
placing the cue in loaded travel probably lies 
in an interruption of an ongoing movement 
affecting the over-all rhythm of the task. 
Here, S had to make a discrimination and 
act upon it immediately. The 65% increase 
in the duration of loaded travel and the sig- 
nificant increases in grasp and unloaded travel 
emphasize the importance of advance infor- 


mation (7) for the smooth functioning of 


sensorimotor skills. Why the duration of as- 
sembly remained unaltered is not readily ap- 
parent. However, this does serve to illus- 
trate that the effect of the perceptual cue is 
not a simple overlapping to immediately ad- 
jacent components. 

The present results show conclusively that 
variations in the perceptual requirements of 
one part of the work cycle significantly affect 
the durations of other parts of the cycle. 
This finding is damaging evidence to a con- 
cept implicit in a good deal of the writing on 
time-and-motion study. This concept, that a 
movement pattern can be treated as a com- 
bination of independent and discrete ele- 
ments, provides the basis for a large number 
of predetermined time-standard systems. To 
set the standard for a new job, the operation 
is analyzed into elements, each element is as- 
signed a predetermined standard time, and 
the total of the element times with certain 
adjustments becomes the time allowed for 
the job. 
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There have been, in the past, suggestions to 
the effect that the components of a move- 
ment are not independent (2, 8, 10). How- 
ever, the existence of the wide variety of: pre- 
determined time standard systems in itself 
provides abundant proof that this point of 
view is not widely known or accepted. 

Some of the predetermined time systems 
do attempt to handle perceptual factors either 
by considering the perceptual response as a 
separate element in the motion cycle and as- 
signing it a standard time (3, 6), or by mak- 
ing allowances for such things as the degree 
of visual control required for the movement 
(1). Even if it were possible to measure the 
perceptual requirements of each part of the 
work cycle and adjust the time standards ac- 
cordingly, the problem still remains of con- 
sidering the effect of the perceptual loading 
of one factor on the durations of the other 
parts of the task. The question which arises 
is whether any predetermined time system 
will ever be able to predict accurately this 
complicated interrelation between perceptual 
processes and motion. 


Summary 


This study was concerned with the inter- 
relation of perceptual processes and work 
movements. The specific variable manipu- 
lated was the locus of a perceptual cue within 
a repetitive patterned motion. A simplified 
assembly task was used. It consisted o/ in- 
serting 40 metal pins into holes in an as- 
sembly plate. Special techniques made pos- 
sible the specification and control of the exact 
point in this work cycle at which a percep- 
tual cue was located. By systematically in- 
troducing the cue into various components of 
an otherwise unchanged pattern of motion, 
the effects of the locus of the perceptual cue 
in defining the temporal relations within the 
motion cycle were determined. Electronic 
methods of motion analysis were used to re- 
cord separately and automatically the dura- 
tions of the four component movements of 
the work cycle, viz., unloaded travel, grasp, 
loaded travel, and assembly. 

Results indicated that, depending upon the 
locus of the cue, the durations of some com- 
ponents were increased while others were not 
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affected. Two generalizations were suggested. 
First, it is apparent that a perceptually loaded 
component takes significantly longer than its 
counterpart which involves less perceptual 
load. Secondly, placing a perceptual cue in 
one part of a work cycle not only affects the 
duration of that part of the cycle, but also 
significantly affects the durations of certain 
other parts of the movement. This finding 
is damaging evidence to a concept implicit in 
a good deal of the writing in the time-and- 
motion study field. This concept is that a 
work movement can be treated as a combina- 
tion of independent and discrete elements. 
It seems unlikely that existing predetermined 
time systems can handle accurately the com- 
plicated interrelation between perceptual proc- 
esses and motion which has been demon- 
strated in this study. 


Received December 13, 1955. 
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The Electronic Handwriting Analyzer and Motion Study 
of Writing *’ 


Karl U. Smith and Richard Bloom 


University of Wisconsin 


Although writing is an almost universal 
skill in modern Western society, there have 
been few if any really scientific studies of the 
different motions used in this performance. 
Recent studies in this field have been di- 
rected very largely to the questions of the 
pressure patterns in writing (2, 3, 4, 5, 8), 
and have involved the over-all measurement 
of the force and speed of writing. Older lit- 
erature of a psychological nature in the field 
dealt with the use and validity of graphologi- 
cal indices in indicating certain personality 
traits. 

Like many other areas of motion study, the 
scientific investigation of movement patterns 
in handwriting has been limited by the meth- 
ods available for studying performance. Our 
primary concern here, then, is an attempt to 
improve methods of studying the movement 
characteristics of writing skills. Specifically, 
we have applied the principles of electronic 
motion analysis (1, 6, 7) to the measurement 
of the time required for the component move- 
ments in writing. A preliminary investiga- 
tion is described here in which an Electronic 
Handwriting Analyzer is used to measure the 
durations of manipulation (contact) and 
travel movements in a writing task. 


Methods and Apparatus 


Figure 1 illustrates the general method used in the 
Electronic Handwriting Analyzer. This device is 
based on a principle of using the writer as an elec- 
tronic key, so that each time he touches the surface 
of the writing paper, he automatically starts and 
stops precision time clocks that measure the dura- 
tion of his writing movements. 

1 This research was supported by funds from the 
National Science Foundation under a grant for the 
study of perception and motion. Financial support 
for building the equipment for this study came from 
the Graduate School Research Committee, The Uni- 
versity of Wisconsin. 

*The basic computational work of this research 
was aided by the Computing Service, The Univer- 
sity of Wisconsin. 
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As shown in Fig. 1, the subject (S) sits at an 
ordinary writing table. In his left hand he holds a 
metal electrode that is attached to the switching 
circuit of an electronic motion analyzer. The sub- 
ject writes with a metal pencil filled with IBM 
electrographic leads. The paper used, Teledeltos pa- 
per,® is also electrically conductive, and is con- 
nected to the switching circuit of the handwriting 
analyzer located inside the electrical housing to the 
right of Fig. 1. When S makes contact with the 
writing paper, he immediately closes one side of the 
switching circuit of the handwriting analyzer, which 
operates a precision time clock.4 When the pencil 
is lifted from the paper after writing a letter or 
word, this first clock is automatically stopped and 
a second clock starts to run. The second clock 
measures the duration of the travel movement be 
tween the first writing contact and any subsequent, 
contact with the paper. 

The writing situation in this method is quite nor- 
mal except for the use of the special paper, the sur- 
face of which appears to be somewhat smoother 
than that of ordinary bond paper. The S does not 
feel at all the very low-level current that is passed 
through his body in operating the handwriting ana- 
lyzer. Special precautions are taken to prevent the 
possibility of electric shock. 

Figure hows diagrammatically the circuits used 
in the Electronic Handwriting Analyzer. As noted 
above, the subject in this task acts as an electronic 
key completing a switching circuit, Mr. This relay 
is closed when S touches the paper with his pencil, 
starting the manipulation clock, Mr, that continues 
to run as long as the pencil touches the paper. 
When S lifts his pencil, however, to travel to the 
next letter or word position, the switching circuit 
automatically stops the first clock and starts the 
clock, Tr, which measures the duration of this travel 
movement. When the pencil touches the paper 
again, the travel clock, Tr, is stopped and the ma- 
nipulation clock, My, started again. Upon the com- 
pleting of one line of writing across the paper, the 
pencil touches the stopping plate on the right margin, 
automatically stopping both clocks. 

With certain modifications this analyzer is being 
made into a portable instrument that can be car- 
ried into the schoolroom or hospital room. We look 
upon this device as a widely applicable instrument 
for the measurement of psychomotor skill for vari- 
ous psychological and educational uses, particularly 
in medical, industrial, military, and school situations 
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® Western Electric Co., New York City. 
* Potter Instrument Co., Boston, N. Y. 
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Motion Study of Writing 


Fic. 1. The Electronic Handwriting Analyzer. 
writes on Teledeltos paper. 
val timer to the right. 
picture : 


In the present preliminary study, the Electronic 
Handwriting Analyzer has been applied to the meas- 
urement of writing single Arabic numerals and let- 
ters of the alphabet. Ten college student Ss wrote 


M, Tr 
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TRAVEL 
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WRITING 
PAPER 


Fic. 2. The circuit relations of the handwriting 
analyzer. The switching circuit is marked as Mr 
and Tr, and the interval timers as Mr and Tr. The 
writing on the paper completes the manipulation 
component of the circuit, and the travel component 
is automatically operated thereafter. Touching the 
stop plate stops the travel clock. 
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The subject holds an electrode in the left hand and 
The switching circuits of the handwriting analyzer are located inside the inter- 
One of the interval timers, the one recording travel time, cannot be seen in this 


ove 


35 characters according to 
conditions. 


certain limited control 
Each S wrote the 35 characters in dif- 
ferent orders chosen in terms of a randomizing pro- 
cedure. One letter was written in each trial. The 
S was instructed to begin at the left of the writing 
paper and write a given letter or number repeatedly 
across the paper to the right edge where the last 
character was to be written on the stopping plate. 
He was instructed to write at his usual speed, which 
meant in most cases that a line of writing included 
10 to 15 distinct letters or ‘numbers. There was no 
attempt in the experiment to obtain data on speed 
writing or to change the usual method of writing in 
any way. One possible exception to this latter state- 
ment is that S was asked to omit dotting the 7 and 
j, and crossing the ¢. 

In order to obtain the data of this study, the mean 
manipulation and travel times for each character for 
each individual S were computed by dividing the 
recorded times by the number of distinct letters 
written in that trial. Analysis of variance of these 
manipulation and travel measures was carried out 
to determine whether, in the 10 Ss, a significant dif- 
ference occurs and between Ss 
manipulation and 


between characters 
in regard to the measures of 
travel time. 

In addition to the determination of the signifi- 
cance of differences between characters in regard to 
the duration of the travel and manipulation move- 
ments involved, correlations between the travel and 
manipulation scores for individual Ss were also com- 
puted and a special measure of writing coordina- 
tion, the manipulation-travel ratio, is computed. 
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Table 1 


The Mean and Standard Deviation Values for Manipu- 
lation Time and Travel Time Required to Write 
Common Numbers and Letters of 
the Alphabet 
Time values are in seconds 


Travel 
(Seconds) 


Manipulation 
(Seconds) 


Mean o 


Mean o 


087 
074 
.065 
.068 
092 


Character 





56 .047 .20 
63 113 Re 
Al .192 

56 133 

Al 081 

5Y 

59 

61 

44 

57 

82 

46 

71 


orf 


a 


— ore eo > oO 


B 


089 
085 
110 
.087 
087 
074 
091 


COIWAMEWHENM X 


Results 


The results of this study will be discussed 
in relation to the following general topics: 
(a) differences in the duration of the ma- 
nipulation and travel movements in writing 
numbers and letters of the alphabet, (>) the 
nature of individual differences in the ma- 
nipulation and travel movements. 


Table 1 summarizes the means and their 
standard deviations of the durations of the 
manipulation (contact) and travel movements 
for the 35 characters written in this experi- 
ment. It is evident that there are marked 
differences in the duration of the manipula- 
tive movements in writing common numbers 
and the letters of the alphabet. The simpler 
letters, such as i, c, and e, show shorter ma- 
nipulation times, whereas more complex let- 
ters such as k, m, q, and w, are of relatively 
long duration. There is little evidence in the 
data to indicate that letters occurring infre- 
quently in writing take a longer time to write 
than those occurring frequently. 

In contrast to the manipulative movements, 
the travel movements between single letters 
in writing remain fairly constant in duration, 
rarely exceeding .3 sec. 

Table 2 presents summaries of the analysis 
of variance of the measurements of manipula- 
tion and travel movements related to differ- 
ent single numbers and letters of the alpha- 
het. . As shown in this table, a significant 
difference between the characters occurs for 
manipulation, but not for travel movements. 
The F value for characters in the case of 
manipulation is significant at the 1% level. 
The F for characters in the case of the travel 
movements is not significant. Under the con- 
ditions of this study there were no significant 
differences in the travel movement scores. 

The individual differences among the 10 
Ss of this study are summarized in Table 3. 
This table is presented primarily to indi- 


Table 2 


Summaries of Analysis of Variance 








Manipulation (Contact) 


Mean 
Square F 
121 20.1** 
340 56.6** 
.006 


Sum of 
Squares df 
Characters 4.13 34 
Ss 3.06 9 
Residual 2.00 306 


Source 


Travel 
Characters 355 34 .0104 
Ss 414 9 .0460 
Residual 8.082 306 .0264 


1.74 





** Significant at the 1% level. 
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Table 3 


Characteristics of Individual Differences in 
Writing Letters of the Alphabet 


M/T M-T 


Manipulation 
Sub- —— - 
ject Mean o 


Travel 
Mean 


40.191 .269 
566 .154 .280 
413 .105 243 
462 .136 209 . 22 -.31* 
478 .120 .279 +.57** 
553.121 315 +.13 
581.119 .230 -.21 
423 .091 .208 — .09 
641 .149 .263 

10 698 107 212 


Ratio Correlation 
2.38 +.12 
2.02 .00 
1.70 + .06 


— 07 
+.11 


* Significant at the 5% level. 
** Significant at the 1°, level 


cate certain important features of the motion 
analysis of handwriting and in particular the 
nature of special measures of individual dif- 
ferences in motor coordination that it is pos- 
sible to make with the present methods. 
The second and third columns of Table 3 
summarize the means and standard deviations 
of the duration of manipulation movements 
for each of the 10 Ss in writing 35 letters and 
numbers. The third and fourth columns give 
the equivalent measures for travel move- 
ments. Measures of manipulation show a 
wider range of variation than do the 10 in- 
dividual travel measures. The _ individual 
standard deviations for manipulation are 
roughly double those for travel movements. 
Special indices of motor coordination may 
be derived from measures of the handwriting 
task. One of these special indices is the ratio 
between the duration of manipulation move- 
ments and the duration of travel movements 
in the different writing performances. We 
refer to this index as the M-T handwriting 
ratio (see Table 3). Such a ratio is a gen- 
eral measure of relative timing and rhythm 
in the psychomotor performance. Continuing 
investigations suggest that this index will be 
of value in studying the effects of various psy- 
chological variables on the handwriting task. 
Another general index of motor coordina- 
tion in handwriting is the manipulation-travel 
correlation value. In the present study such 
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an index of individual differences is derived 
from correlating the measures of manipula- 
tion and travel for the 35 numbers and let- 
ters for each S. The correlation value ob- 
tained refers to the degree of relation be- 
tween manipulation and travel for a given 
single individual. Although chance may be 
operating to produce the results, this corre- 
lation value varies sharply for different indi- 
viduals. For one S it is significantly nega- 
tive and for another, significantly positive, 
but for the other Ss used it is not significant 
from zero. 


Summary and Discussion 


The general-study of handwriting as a psy- 
chomotor skill has never been investigated 
adequately. This research applies, appar- 
ently for the first time, precise methods of 
motion analysis to the investigation of writ- 
ing skill. Specifically, an Electronic Hand- 
writing Analyzer is described that permits 
separate and automatic measurement of the 
component movements of manipulation and 
travel in the writing task. Results bearing 
on the application of this electronic hand- 
writing analyzer to the measurement of writ- 
ing common numbers and letters of the al- 
phabet are summarized. 

In preliminary results obtained on 10 Ss 
on the writing of single numbers and script 
letters, it was found that the manipulation 
(contact) movement varies significantly in 
duration in writing different letters and num- 
bers. The travel movements associated with 
writing these same letters showed much less 
variation both in relation to the letters and 
numbers written and in relation to individu- 
als. It is very evident that the Electronic 
Handwriting Analyzer provides an effective 
device for measuring the duration of the com- 
ponent movements in writing under almost 
any conditions. 

Inasmuch as the handwriting analyzer pro- 
vides separate measures of manipulation and 
travel in writing performance, it is possible 
to obtain relational measures of motor co- 
ordination with this instrument. One such 
measure is the manipulation-travel ratio (or 
M/T ratio), an index obtained by dividing 
the manipulation time in a given individual 
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or condition by the duration of the associated 
travel movements. Another measure of motor 
coordination is the manipulation-travel cor- 
relation, a value representing the relation be- 
tween the duration of manipulation and 
travel movements. Both of these special 
measures of motor coordination provide in- 
teresting new quantitative expressions of vari- 
ables to be studied in the handwriting task. 

Because of the almost universal nature of 
writing as a psychomotor skill among both 
young and old, precise measurements of writ- 
ing are possibly of very broad significance in 
the general analysis of the motor coordina- 
tion in relation to growth, aging, learning, 
and other psychological factors. The Elec- 


tronic Handwriting Analyzer provides the in- 
strumentation essential for such research. 
The device could also be used diagnostically 
in the educational, medical, and industrial 
measurement of motor performance. 


Received December 5, 1955. 
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Scale reading on a polar coordinate dis- 
play usually involves two processes. One 
process involves identifying the value of the 
scale marker nearest the target or “pip.” 
The other process involves estimating the po- 
sition of the target between scale markers. 
This paper is concerned with both these proc- 
esses as they relate to the speed and accuracy 
of reading target position on a polar coordi- 
nate display. 

Leyzorek (9) and Baker (1) have studied 
interpolation accuracy as a function of the 
distance between scale rings on polar coordi- 
nate displays. Leyzorek found that the av- 
erage error of interpolation was 4% of the 
interval for scale ring separations of .5 in. or 
greater. For scale ring separations of less 
than .5 in. the interpolation error increased. 
Leyzorek’s finding of the .5-in. “critical in- 
terval” agrees generally with the findings of 
Grether and Williams (7) and Carr and 
Garner (3) who varied the separation dis- 
tance between reference marks similar to 
those found on dials. Baker, however, found 
that the error of interpolation was constant 
at 4% of the interval over the entire range 
of scale-ring separation distances studied (.25 
to 4 in.) and for viewing distances up to 40 
in. Kappauf and Smith (8) report the “criti- 
cal interval” distance for graduation marks 
on dials to be .25 in. 

On the basis of these studies one might 
recommend the use of many scale rings in 
order to achieve the highest degree of ac- 
curacy, in that an increase in the number of 
scale rings on a display with a given range 
would mean that the distance represented 
between adjacent scale rings would be pro- 
portionately less. Therefore, the average in- 
terpolation error of 4% of the scale-ring in- 
terval found in the above mentioned studies 
would represent a smaller absolute error. 


1J. M. V. is now at Washington University, St. 
Louis, Mo. 
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However, it is reasonable to suspect that as 
the number of scale rings is increased there 
would be an increase in the number of errors 
resulting from the misidentification of the 
scale rings (gross errors) as well as an in- 
crease in the total amount of time required 
to determine the location of the target. 
Garner et al. (5) found that time scores in- 
creased as the number of scale rings increased 
from four to eight. In their study the Ss 
were required to report only which ring was 
marked with an “X.” No interpolation was 
required. Saltzman and Garner (10) noted 
an increase in time scores as a function of 
an increase in the number of polar coordinate 
scale rings when the S’s task was merely to 
count the total number of rings presented. 
Although the number of scale rings was not 
systematically varied, Green and Anderson 
(6) found that the polar coordinate display 
containing the most detail required more time 
than less complex displays. It is the purpose 
of this paper to report the precise nature of 
these functions where the Ss were required to 
estimate target position over a wide range of 
experimental conditions. Also, it was ex- 
pected that increasing the size of the display 
would improve speed and accuracy on those 
displays with many scale rings, since the in- 
crease in display size would increase the 
separation between the scale rings, rendering 
them easier to count and perhaps permitting 
more precise interpolation. 


Method 


Apparatus. The stimulus materials consisted of 
3} X 4-in. transparent slides projected from the rear 
onto a translucent viewing screen. The viewing 
screen was constructed with K & E No. 198 MX 
crystalline tracing paper sandwiched between two 
sheets of 4-in. Plexiglas. This arrangement gives a 
fairly realistic simulation of a PPI display, save for 
the lack of temporal and spatial variation in lumi- 
nance. On each slide was a circle 2 in. in diameter. 
Within this circle were either 0, 2, 4, 9, 19, or 39 
equally spaced concentric circles. Every fifth ring 
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Fic. 1. A diagram of one of the polar coordinate 
displays showing the scale rings and targets. Note 
that every fifth ring is wider to assist in identifica- 
tion. Each scale ring represents a distance of 1,000 
yards. The contrast relations are reversed, ie., the 
rings and targets in the experiment bright 
traces on a dark background. 


were 


was coded by making the ring twice the thickness 
of the other scale rings. One example of these dis- 
plays is shown in Fig. 1. Each of the six different 
numbers of scale rings used appeared on 10 slides, 
making a total of 60 slides in all. A zero-range ref- 
erence dot appeared in the center of each display 
The scale rings, when projected on the viewing 
screen, were .02 in. wide on the 5-in. display and 
were proportionately wider for the 7, 9, and 11-in 
display; in each case the rings were light against a 
dark background. Five targets, each consisting of 
an arc .04 in. wide and .25 in. long (on the 5-in 
display), appeared on each of the 60 slides. 

The 300 targets presented were distributed equally 
in each quadrant of the display and from the center 
of the display to the outer scale ring. During the 
experiment the S’s head movements were minimized 
by having him place his head against a location on 
a wall directly behind his chair. The viewing screen 
was located 22 in. from S’s eyes, and the slide pro- 
jector was mounted behind the screen so that it 
could be adjusted to give the optimal image of the 
display on the screen. The display sizes used in 


the experiment were 5, 7, 9, and 11 in. in diameter. ' 


These sizes were obtained by varying the distance 
between the projector and the screen. The plane of 
the viewing screen was normal both to the optical 
axis of the projector and also the S’s line of sight. 
The average luminance of the scale rings and targets 
was 30 mL. and background 2 mL. 

Experimental design. The total of 60 slides was 
divided into six groups of 10 slides each. Each of 
the 10 slides within a group contained the same num- 
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ber of scale rings, ie., each slide contained either 1, 
3, 5, 10, 20, or 40 scale rings (including the outer 
circle described above). Each S was tested over 
four sessions. In any one session S read all six 
groups of slides (60 in all), presented at one of the 
four display sizes used. At three subsequent ses- 
sions S read the same slides at each of the three re- 
maining sizes. The order of presentation of the six 
groups of slides read at any one session and the 
order of presentation of the four display sizes were 
balanced across the group of 12 Ss by a latin-square 
design. Each display size and each group of slides 
appeared equally often in each order position and 
were thus balanced over the entire group of Ss. The 
order of presentation of the 10 slides in a given 
group of slides was random. Sources of bias such 
as practice effects and differential difficulty of stimu- 
lus slides were thus balanced over the groups, as 
were order effects involving number of scale rings. 

Subjects. The Ss were 12 men and women stu- 
dents at Antioch College. None had obvious visual 
defects. 

Procedure. A range value of 1,000 yards was as- 
signed to each scale ring, regardless of the number 
of scale rings on the display. Therefore the dis- 
play with one scale ring had a range of 1,000 yards 
and the display with 40 scale rings had a range of 
40,000 yards. In the operational situation a polar 
coordinate display is designed to represent a cer- 
tain fixed maximum range, and an increase in the 
number of scale rings does not increase this range, 
but merely divides this total range into smaller in- 
tervals. Since the numerical value assigned to the 
scale-ring interval is known to affect the accuracy 
of interpolation (4), the 1,000-yard value was as- 
signed to each scale-ring interval in order to avoid 
confounding the number of scale rings used with 
the value of the scale-ring interval. Subjects were 
instructed as follows: “We are going to show you a 
number of slides representing a series of radar scopes. 
The distance between each scale ring represents 1,000 
yards. Your task is to estimate the distance of the 
target from this center dot which represents your 
position. Therefore, this scope with 10 range rings 
has a total range of 10,000 yards. Note that every 
fifth ring is thicker to help you identify them cor- 
rectly. Give your estimate to the nearest 10 yards 
of range. Be as accurate as you can but proceed 
rapidly since you will be timed. Start at the 12 
o’clock position and call out the ranges in clock- 
wise order.” Each S had fifteen minutes of practice 
with the various displays before the experiment 
hegan 


Results 


The data were recorded in yards of range 
as reported by the Ss. These data were then 
converted into error scores. The error scores 
were obtained by measuring the magnitude 
and direction of the difference between the 
reported range and the actual measured range 
for each target. The actual measured range 
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was determined to the nearest 10 yards with 
the use of a machinists’ etching microscope 
by two Es independently. Redeterminations 
of discrepancies were made until each meas- 
ured range was identical for both Es. The 
error score in yards was then expressed as a 
percentage of the scale interval and as a per- 
centage of the total range of the display. In 
addition, each error score was classified as a 
gross error or as an interpolation error, or 
both, depending upon the nature and magni- 
tude of the error. A gross error is defined as 
any reported range of a target that is not in- 
cluded in the range values between the scale 
rings containing that target. Actually, the 
preponderance of gross errors were those as- 
sociated with reported target ranges that were 
approximately 1,000 yards in error. Less 
than 5% of the gross errors were greater 
than a single scale ring interval. All other 
errors were Called interpolation errors. Thus 
a reported range of 2,370 for a target whose 
measured range was 1,480 would be called 
both a gross error (1,000 yards) and an in- 
terpolation error (110 yards). Measures of 
the average time required to report the range 
of each target on a slide were also obtained 
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by recording to the nearest second the time 
taken to report all five ranges on a given slide 
and dividing this figure by five. 

In Fig. 2 is presented a combined plot of 
average error, expressed as a percentage of 
the total range (interpolation error), percent- 
age of gross errors, and mean time per target 
required to report range, each variable being 
plotted as a function of the number of range 
rings. These data are shown for all scope 
sizes combined. It can be seen from an ex- 
amination of Fig. 2 that interpolation error 
is a decreasing function of the number of 
range rings, while the frequency of gross 
errors and time are both increasing functions 
of the number of range rings. It should be 
noted that although the frequency of gross 
errors increases with an increase in the num- 
ber of scale rings, the magnitude of these 
gross errors, expressed as a percentage of 
total range, decreases. More than 95% of 
the gross errors are of the order of magnitude 
of one scale ring interval. A gross error of 
one interval on a 40-ring display represents 
2.5% of the total range, whereas a gross 
error on a display with five scale rings repre- 
sents 20% of the total range. Thus, an in- 
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Fic. 2. 


Gross errors, average interpolation error, and time required to 


read target position as a function of number of scale rings, for all display 


sizes combined. 


Each point on the graph consists of 2,400 observations. 
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Fic. 3. Average interpolation error as a function 
of the distance between scale rings. The distance 
between scale rings is determined by the number of 
scale rings on the display and the display size. Since 
four scope sizes and six scope types were used, 24 
different scale-ring separations result. Note the break 
in the function at about 4-in. separation. 





crease in the number of scale rings results in 
an essentially proportional decrease in the 
magnitude of gross errors. 

Figure 3 is a graph of interpolation error 
as a function of the distance between scale 
rings. ‘The distance between the scale rings 
is determined by both the number of rings 
and the display size. Since four sizes of dis- 
play and six different numbers of scale rings 
per display were used, 24 actual distances be- 
tween markers result. These data resemble 
those reported by Leyzorek (9) who found 
the “critical interval” to fall at .5-in. sepa- 
ration for polar coordinate displays. It should 
be noted, however, that the absolute error 
magnitude continues to decrease with de- 
creasing interval sizes below .5 in. 

Constant errors. Figure 4 contains a plot 
of the constant errors of interpolation as a 
function of the location of the target between 
scale rings. The constant error is the dif- 
ference between the mean of all the readings 
for each target and the measured position of 
that target. Positive constant errors are as- 
sociated with overestimation of target posi- 
tion, while negative constant errors are as- 
sociated with underestimation. Examination 
of Fig. 4 shows that the nature of the con- 
stant error is affected by the number of scale 


‘again the average curve is plotted. 
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rings. When 10, 20, or 40 scale rings are 
used, the constant errors tend to be negative 
for target positions near the inner scale ring 
and then become large and positive for target 
positions just past the mid-point of the scale 
ring interval, diminishing near the outer scale 
ring. The constant error functions for the 
10, 20, and 40 scale-ring displays were very 
similar and for simplicity of graphing they 
were averaged and plotted together. The 
constant error functions for 1, 3, and 5 scale- 
ring displays were also very similar, and 
However, 
this curve is quite different from the scale- 
ring constant error function for the 10, 20, 
and 40 scale ring displays. The errors asso- 
ciated with the displays with fewer rings are 
always positive and show a peak for target 
positions midway between the scale-ring in- 
terval. With an increase in the number of 
scale rings there is:a proportional decrease in 
the distance between scale rings. The con- 
stant error function for the 10, 20, and 40 
scale-ring displays is quite similar to that 
which Carr and Garner (3) found for refer- 
ence markers which were closely spaced. The 
constant error function for the 1, 3, and 5 
scale-ring displays, however, does not con- 
form with other studies (2, 3, 4) in which 
similar scale-interval distances were employed. 
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Fic. 4. Constant error as a function of the posi- 
tion of the target between adjacent scale rings. The 
solid line is the average curve for displays with 1, 
3, and 5 scale rings. The broken line is the average 
curve for displays with 10, 20, and 40 scale rings. 
There are 720 observations for each point. 
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All three of these studies are in agreement 
with respect to the constant error function in 
that the targets were judged to be too close 
to the mid-point of the interval. Therefore, 
there are positive constant errors for target 
positions below the mid-point and negative 
constant errors above the mid-point. The 
reasons that the present data for the 1, 3, 
and 5 scale-ring displays do not conform with 
the data from these other papers are not 
known. 

Display size. Display size had only a small 
effect upon interpolation accuracy. For all 
scopes averaged, the average error of inter- 
polation decreased linearly from 6% of the 
interval in error to 5.2% of the interval in 
error. Gross errors decreased markedly in 
a linear fashion from the smallest display 
(3.4% of the readings in error) to the largest 
display (1.7% of the readings in error). 
Mean reading time did not vary systemati- 
cally with display size. 

Practice. Throughout the four sessions 
there is continued improvement in both speed 
and accuracy. Average interpolation error 
decreased linearly from 6.2% of the interval 
in error for the first session to 4.8% for the 
last (fourth) session. Gross errors decreased 
in an approximately linear fashion from 3.4% 
for the first session to 1.5% for the last ses- 
sion. Average time per target decreased from 
5.7 sec. to 4.2 sec. from the first to the last 
sessions respectively. The curve was nega- 
tively accelerated, but an asymptote was not 
yet reached. It is likely that with continued 
practice the error and time measures would 
continue to decrease. It may thus be as- 
sumed that the values shown in the previous 
figures would be somewhat lower, in general, 
for more experienced Ss than for those Ss 
used in the present study. 


Summary 


The study was designed to investigate the 
speed and accuracy of determining target po- 
sition on a polar coordinate display as a func- 


tion of the number of scale rings. Polar 
coordinate displays of 5, 7, 9, or 11 in. in 
diameter with 1, 3, 5, 10, 20, or 40 scale 
rings were used. 

Error of interpolation (in percentage of the 
total range of the display) decreased as a 
function of the number of scale rings used. 


The frequency of gross errors (misidentifica- 
tion of scale rings) and the time required to 
make readings increased as a function of the 
number of scale rings. Increasing display 
size improved interpolation accuracy slightly 
and decreased the frequency of gross errors 
markedly. 

Constant errors of interpolation were found 
to be a function of the position of the target 
between scale rings and also a function of the 
number of scale rings used. 

An analysis of practice effects reveals that 
the Ss continued to improve in both speed 
and accuracy throughout the experiment. 


Received December 12, 1955. 
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This investigation is one of a series of 
studies on the relationship of interpersonal 
perception and group effectiveness (2, 3, 4). 
It was designed to test an hypothesis which 
grew out of earlier studies on basketball and 
surveying teams (2), and military aircraft 
and tank crews (3). The present research 
was conducted in open-hearth shops of a 
large steel company in which the personnel, 
in contrast to subjects (Ss) participating in 
these earlier studies, is highly stable over 
time, and where carefully maintained produc- 
tion records are available. 

Interpersonal perception is measured here 
by means of the score, Assumed Similarity 
between Opposites, or ASo. This score re- 
flects the extent to which the subject (S) 
predicts different responses for the man with 
whom he can work best, and the man with 
whom he can work least well. We interpret 
ASo as a measure of the psychological dis- 
tance which S perceives between himself and 
his co-workers. Supervisors who predict simi- 
lar responses for their best- and least-liked 
co-workers (high ASo) are, by this inter- 
pretation of ASo, more accepting, approach- 
able individuals, while supervisors who pre- 
dict less similar responses for these workers 
(low ASo) are presumably more critical and 
analytic in their work relationships. The 
primary hypothesis to which these earlier 
group studies have led is that more produc- 


1We are indebted to Messrs. J. H. Vohr, P. E. 
Thomas, E. C. Sorrells, H. W. Erler, and G. H. 
Warnock, of the Gary Steel Works; and to Mr. Dan 
Farrell and Mr. E. W. Kempton, of the United States 
Steel Corporation, for their cooperation and support 


of this study. We are also indebted to Drs. L. J. 
Cronbach, Eleanor P. Godfrey, Ross Stagner, and 
C. F. Wrigley for their contributions to the design 
and administration of the study, and to Mrs. Betty 
F. Mannheim who assisted with the analysis of the 
data. 

The study was conducted under Contract N6-ori- 
07135 between the Office of Naval Research and the 
University of Illinois, with F. E. Fiedler as principal 
investigator. 


tive groups have leaders who tend to differ- 
entiate more in perceptions of their most and 
least preferred co-workers (i.e., have lower 
ASo) than do leaders of less effective groups. 

The earlier reports in this series of studies 
have suggested some possible restrictions or 
special cases of this general hypothesis. In 
particular, three points have been noted, 
which we shall mention briefly. The data 
suggest (a) that the hypothesis holds espe- 
cially in the case of groups which accept the 
leader. This acceptance is defined in terms 
of the number of sociometric choices the 
leader receives from his group (3). (5) The 
sociometric likes and dislikes of the leader 
for certain key personnel in the group may 
be important in determining the direction of 
relationship between ASo of the leader and 
the effectiveness of the group. It was noted 
that effective groups may have either low 
ASo leaders who sociometrically prefer key 
subordinates, or high ASo leaders who do not 
sociometrically choose their key subordinates. 
(c) It has also been suggested that the hy- 
pothesis may be valid only for tasks which 
require “direction-giving leadership behavior” 
(3). More effective groups engaged in tasks 
which require receptive leadership behavior 
for effective group coordination may have 
leaders with high ASo, regardless of the lead- 
er’s preferences for his key subordinates. 
We have not investigated the restrictions 
based on sociometric choice in the present 
paper, in part because of our limited sample 
size, and in part because of the possibility 
that sociometric choices become relatively 
less important in long-lived groups. Men 
who at first may find it difficult to get along 
together will, in the course of two, five, or 
ten years either learn to do so or else leave 
the group by transferring to another crew, or 
to another job. It seems reasonable to as- 
sume that men who have worked together for 
several years are rather adequately adjusted 
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to each other, an assumption which would 
not be warranted in relatively short-lived 
groups such as military units. 


Procedure 


Sample. Management personnel in four open 
hearth shops of a large steel company participated 
in the study. These four shops are engaged in simi- 
lar operations, although equipment varies somewhat 
from shop to shop. 

Each shop is operated on a 24-hour, 7-day-weck 
basis with each shift or “turn” working eight hours 
Every turn has a full complement of first- and sec- 
ond-line supervisors and their crews. Since one turn 
is off duty in any one 24-hour period, each shop re- 
quires four turns. A total of 16 turns thus consti- 
tutes our sample.” 

Four supervisors are in charge of each turn: one 
General Foreman, one Stock Foreman, one Pit Fore- 
man, and one Senior Melter. The General Foreman, 
along with the Stock and Pit Foreman, directs the 
supporting operations of raw material assembly and 
final steel pouring. The Senior Melter is in charge 
of steel manufacture. Depending on the number 
and size of furnaces in the shop, the Senior Melter 
supervises one or two Junior Melters and _ their 
crews. In three of the four shops (or 12 of the 16 
turns), the Senior Melter has two Junior Melters 
reporting to him. 

Test instrument. Each available foreman and 
melter was requested to predict the responses of two 
persons he had known: (a) the man with whom he 
could work best, and (b) the man with whom he 
could work least well. (These ratees could be any- 
one with whom S had ever worked; S was not asked 
to specify their names.) If the two predictions by 
a single S are quite different, he is said to have low 
ASo; conversely, if the two predictions are quite 
similar, he is said to have high ASo. The test con- 
sisted of 40 statements such as: “I tend to join 
many organizations,” “I am often bored with peo- 
ple,” and “I am generally regarded as optimistic.” 
Each item was answered on a six-point scale rang- 
ing from “definitely true” to “definitely untrue.” 
The similarity of these two predictions, computed 
by the statistic D (1, 5), yields the index, “As- 
sumed Similarity between Opposites” (ASo). 


Criterion 


The index of group effectiveness is based on the 
time elapsed from one “tap” (pouring of molten 
metal from the furnace) to the next tap on a par- 
ticular furnace. For economic reasons, company 
officials regard short “tap-to-tap time” as the most 
important production goal. The primary impor- 
tance of this production goal is recognized and ac- 
cepted by the foremen as well as their subordinates 

2 All four turns within a shop «are under the direc 
tion of a single shop superintend:nt and his assistant 
superjmtendent. These men were not tested 
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The average tap-to-tap time is about 10 hours; 
two turns are, therefore, involved in preparing each 
batch of steel, or “heat,” for tapping. However, the 
tap-to-tap time scores are uniformly assigned to the 
Senior Melter in charge of the furnace at the time 
the tap is made, regardless of the length of time the 
shift has actually worked on the heat. This seems 
justifiable because the last hours of the heat are re- 
garded as more critical in the manufacturing process 
than the first few hours. In addition, randomiza- 
tion takes place because the turns do not systemati- 
cally follow one another or use the same furnaces 
in the shop. 

Using an analysis of variance of ranked data, we 
found significant differences between shops in tap- 
to-tap time. Since these differences can be attributed 
to different furnace capacities in the four shops, the 
tap-to-tap time data were standardized within shops 
by means of T scores. This procedure is designed 
to eliminate variance due to differences in equip 
ment and to retain the variance which may be at- 
tributable to leadership variables. 

Criterion reliability. The reliability of tap-to-tap 
scores was based on an anlysis of over 25,000 heats 
based on the 3- to 16-months period preceding test- 
ing. We excluded the summer months on recom 
mendation of company officials because of extensive 
personnel shifts due to vacation schedules. An even 
month vs. odd-month split-half procedure was em- 
ployed. The estimated reliability of tap-to-tap time 
over the 16 turns is .82. In order to minimize the 
effects of long-range changes, e.g., in personnel or 
company policy, the criterion scores used below are 
based on only a part of these data, namely the 3- to 
10-month period immediately preceding testing 


Results 


Table 1 presents the correlations between 
the average turn tap-to-tap time and the ASo 
(Assumed Similarity between Opposites) of 
the General, Pit and Stock Foremen, and 
Senior Melters. As the table shows, the cor- 
relation between average turn tap-to-tap time 
and ASo is significant in the predicted direc- 
tion in the case of Senior Melters and Pit 


Table 1 


Correlations (Rho) Between ASo of Various Super 
visors and Average Turn Tap-to-Tap Time 
Rho 
General Foreman 15 — 13 
Stock Foreman 15 — 42 
Pit Foreman 14 —.72 
Senior Melter 15 — .54 
Supervisor average 16 ae 


Supervisor ASo N* 


*N varies due to missing data. 
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Foremen. The correlation falls short of an 
acceptable significance level for Stock Fore- 
men, and is negligible for General Foremen. 
In addition, the average ASo of the foremen 
and Senior Melter on each turn is signifi- 
cantly related to average turn tap-to-tap time. 


Discussion 


Our hypothesis states that more effective 
groups have supervisors with low ASo. The 
significant correlations of Senior Melter and 
Pit Foreman ASo, as well as average super- 
visor ASo, with average turn tap-to-tap time 
support this hypothesis. 

Of particular interest is the high correla- 
tion between mean turn tap-to-tap time and 
ASo of Pit Foremen. On the surface, the 
melters appear to determine tap time, since 
it is their decision as to when the heat is 
ready to be tapped. These results suggest 
that variance in tap-to-tap time may also be 
a function of the Pit Foreman, although, con- 
versely, the ASo of these supervisors may be 
a function of the turn efficiency or of some 
related variable. On the other hand, the 
low correlation in the case of the General 
Foremen may indicate that these men have 
the least influence on turn efficiency, as 
measured by tap-to-tap time, even though the 
limited number of cases in our sample does 
not enable us to reject the hypothesis that all 
differences among the obtained correlations 
are a matter of chance. 

The fact that mean turn tap-to-tap time is 
negatively related to the mean ASo of the 
turn’s four foremen suggests that ASo scores 
within turns may be homogeneous. A ranked 
analysis of variance test shows this to be the 
case. The long-lived nature of these groups 
could cause this covariation of ASo. For ex- 
ample, if more similar Ss are more congenial 
to each other, selective factors may operate 
in the personnel placement process, such that 
similar Ss tend to be assigned to the same 
turn. Alternatively, changes in interpersonal 
perception may occur as a result of group 
processes within the turns. The possibility 
of such changes is suggested by the recent 
findings of Steiner and Dodge (6). 
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Summary and Conclusions 


A study was conducted relating the inter- 
personal perceptions of open-hearth shop 
foremen to the productivity of their work 
units. Interpersonal perception was meas- 
ured by means of Assumed Similarity (ASo) 
tests which reflect how similar or different a 
person describes his most and his least pre- 
ferred work companions. Group effectiveness 
measures were based on output as indicated 
by “tap-to-tap” time, the time required to 
complete a “heat” of steel. This criterion 
measure has considerable stability and is re- 
garded as the most important production in- 
dex by company officials. 

Management personnel of four open-hearth 
shops of a large steel company participated 
in this study. Interpersonal perception (ASo) 
tests were administered to all available Ss. 

Significant relations were found between 
supervisor ASo and the tap-to-tap time index. 
These results are consistent with the hypothe- 
sis that more effective groups have super- 
visors who tend to predict different responses 
for their most- and least-preferred co-workers 
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I'he customary practice in individual test- 
ing has been for the examiner to offer the ex- 
aminee encouragement between subtests. In- 
structions accompanying individual mental 
tests uniformly encourage the use of encour- 
agement (20, p. 57; 21, p. 171; 9, p. 6). 
This has been true despite a concurrent em- 
phasis throughout the development of psy- 
chological testing on standardization of test 
conditions. Encouragement has been offered 
testees as perhaps a corollary of the axiom of 
psychological testing that each test should 
measure a testee’s best possible performance. 

Not all writers on tests have given com- 
plete acceptance, however, to the use of en- 
couraging comments. Bingham (3, pp. 226- 
227), Goodenough (6, p. 305), and Terman 
and Merrill (20, p. 57) recommend that en- 
couragement be used with caution and mod- 
eration. Cole (4) contends that encourage- 
ment may interfere with rapport and reduce 
the validity of test results. 

If a testee’s best possible performance and 
valid test results are to be achieved, indi- 
vidual differences between testees should be 
taken into account. Since encouragement may 
have different effects upon different testees, it 
would be of value to know, in advance of in- 
dividual testing, which testees would benefit 
from encouragement and which would not. 
The present experiment used measured anx- 
iety in an effort to differentiate subjects (Ss) 
whose performance encouragement improves 
and those whose performance encouragement 
impairs. A variable which can so differenti- 
ate Ss could function as a barometer to indi- 
cate whether the examiner should blow hot 
or cold. 


Method 


Population sample. The sample consisted of stu- 
dents, mostly freshmen, enrolled in an introductory 
psychology course at a community college offering a 
two-year terminal program. Of the 211 students in 


1 Based upon a Ph.D. thesis accepted by New 
York University in 1955 
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the sample, 161 were men and 50 women. Their 
ages ranged from 16 to 27, but 42 per cent were 18. 
To these 211 students assembled in six class sections, 
the investigator administered two anxiety scales for 
the purpose of dividing these students into three 
groups with varying amounts of anxiety. 

Anxiety instruments. The anxiety scales employed 
were the Taylor Manifest Anxiety Scale (19) and 
an anxiety questionnaire developed by Sarason and 
Mandler (16). The former scale tends to measure 
personality anxiety, while the latter is especially 
geared toward anxiety in the testing situation. The 
50 MMPI items selected and revised by Taylor were 
employed in this study without filler items. Mc- 
Creary and Bendig (12) found a correlation of .95 
between the 50-item form and the 225-item form 
containing 175 unscored items. The Sarason-Man- 
dler scale has 35 scored items and four fillers. 

Selection of subjects. Since this experiment was 
concerned with both the anxiety which Ss_ bring 
with them to the testing situation and the anxiety 
evoked thereby, the two anxiety instruments were 
used jointly to create the requisite groups. It was 
also believed that two such instruments would serve 
as checks against each other to avoid the improper 
placement of some Ss in the three anxiety cate- 
gories. 

Standard scores on the two scales were averaged 
to form a combined distribution, from which were 
selected 28 Ss at the high-anxiety end of the dis- 
tribution (above the 81st percentile), 26 at the low- 
anxiety end (below the 19th percentile), and 30 in 
the middie (44th to 61st percentiles, inclusive). The 
Ss in each anxiety category were later divided into 
two groups equated on the basis of total scores on 
the MacQuarrie Test for Mechanical Ability. 

Experiment proper. About two weeks after the 
high-, low-, and medium-anxiety Ss had been selected, 
each of them was notified, by means of a mimeo- 
graphed 3 X 5 card used by the dean’s office, to re- 
port to a designated room at a designated time, al- 
most always a free period. When each S arrived at 
this room, which was furnished with a large table 
and two chairs facing each other, he was told, “Some 
of you have been selected by lot to take a short 
aptitude test. It takes about twenty minutes. Later 
on you will be given the results.” 

The investigator then individually administered to 
each S$ the MacQuarrie Test for Mechanical Ability. 
This test, actually a battery of seven subtests, was 
selected as providing a series of timed tasks not 
definitely related to any specific aptitude or ability 
and “relatively independent of intelligence in per- 
sons of similar status” (18, p. 267). To avoid the 
possibility of some Ss’ ostensibly not caring whether 
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they did well on a test of “mechanical ability,” the 
test was begun with the cover folded back. To as- 
sure some personal involvement, each S’s name was 
written conspicuously on the first open page. The 
test was administered without any comments by the 
investigator. 

On the basis of total scores on the MacQuarrie, 
the Ss in each of the three anxiety categories were 
divided into two groups equated as closely as pos- 
sible with regard to means and standard deviations 
About six weeks after the first MacQuarrie, each S 
was again notified to report to the designated room, 
this time the 3X5 card bearing the additional 
phrase, “for your aptitude test results.’ Upon ar- 
rival, each S was told, “In ten minutes I am going 
to give you the results of the aptitude test you took 
previously. But before we do that, you are to take 
a greatly shortened form of that test.” The investi- 
gator again individually administered the MacQuar- 
rie to each S, with the time allowances reduced suffi- 
ciently to prevent Ss’ completing any subtest, and 
thus ensuring the ensuing distribution’s discrimina- 
tive power. 

Encouragement, the independent variable, was in- 
troduced during the second MacQuarrie, which was 
administered with encouraging comments between 
subtests to the members of one group in each anx- 
iety category and without comments to the mem- 
bers of the other group in each category. The lat- 
ter three groups served as controls. The comments 
used are of the kind suggested by authorities like 
Binet (2, p. 34), Terman (20, p. 57), and Wechsler 
(21, p. 171). Comments were employed flexibly 
and in sufficient variety for adaptation to particular 
situations and to individual Ss, the aim being to 
duplicate as closely as possible the normal use of 
encouragement in individual psychological testing. 

Upon completing the second MacQuarrie, each S 
was given a 3 X 5 card bearing on one side the re- 
sults of the first MacQuarrie in terms of published 
percentile ranks and on the other side a general in- 
terpretation of the results. For interested students, 
the results of both tests were made available in the 
dean’s office in the form of rank-order lists 
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Results 


Comparisons were made of the performance 
on the second MacQuarrie of the encouraged 
low-anxiety group and the nonencouraged low- 
anxiety group, of the encouraged middle- 
anxiety group and the nonencouraged mid- 
dle-anxiety group, and of the encouraged 
high-anxiety group and the nonencouraged 
high-anxiety group. Two-tailed ¢ tests were 
employed to determine the significance of 
the differences between each pair of groups. 
Table 1 shows the results of these intergroup 
comparisons as regards total score, number of 
errors, and standard deviation. 

Interindividual comparisons of total scores 
were also made for the pairs of Ss matched 
during the equating of groups. These com- 
parisons reveal that eight of the 13 encour- 
aged low-anxiety Ss surpassed their nonen- 
couraged mates, while five were surpassed; 
eight of the 15 nonencouraged middle-anxiety 
Ss surpassed their encouraged mates, six were 
surpassed, and one was equaled; eight of the 
14 nonencouraged high-anxiety Ss surpassed 
their encouraged mates, five were surpassed, 
and one was equaled. Based upon a normal 
expectancy of an equal number of relative 
improvements and impairments of perform- 
ance for the encouraged Ss in each anxiety 
category, ¢ tests yield a p < .4 for the low- 
anxiety Ss, a p < .7 for the middle-anxiety 
Ss, and a p < .4 for the high-anxiety Ss. 


Discussion 


The one finding of statistical significance 
indicates that encouragement is related to 


Table 1 


Differences Between Encouraged and Nonencouraged Groups on the MacQuarrie Test for Mechanical Ability 


Low-Anxiety Ss 
(13 encouraged, 13 not) 


Differ 

ence p 
Total score 4.15* . <.4 
No. of errors 1.00* <9 
SD 9.14t 


Factor 
compared 





* Favors encouraged group. 
+ Favors nonencouraged group. 
t SD greater for encouraged group 


Middle-Anxiety Ss High-Anxiety Ss 


(15 encouraged, 15 not) 





1.67* 
4.47} é 5 <3 


Differ- 
ence > p 


74t . 43" <4 
<2 
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increased variability of performance among 
low-anxiety Ss as selected in this experiment. 
Nonsignificant trends in the same direction 
were found for the medium- and high-anxiety 
Ss. These findings accord with the findings 
of previous experiments on encouragement 
(7, 23). 

Other nonsignificant trends point to the pos- 
sibility that low-anxiety Ss perform better un- 
der encouragement. More of these Ss did 
better under encouragement than did worse 
(eight to five). The encouraged group of 
low-anxiety Ss also obtained a better aver- 
age score than the nonencouraged group. 
The direction of these trends accords with 
both experimental findings (8, 14) and theo- 
retical considerations (1, 17). Encourage- 
ment would tend to involve the self-esteem 
of low-anxiety Ss and thereby increase their 
motivation, without producing in many such 
Ss the hampering effects of stirred-up anx- 
iety. Having presumably little anxiety to 


start with, most low-anxiety Ss would not be 
likely to reach the critical point in motiva- 
tion beyond which impairment of perform- 
ance sets in. 

Although very few experiments have in- 


cluded middle-anxiety Ss, theorization leads 
to a similar expectation that encouragement 
would involve the self-esteem of most such 
Ss so as to bring about improved perform- 
ance, their initial level of anxiety being con- 
sidered sufficiently low not to be raised be- 
yond the point of efficient performance. This 
expectation was not borne out by the find- 
ings, which indicate (though at a chance 
level) that the performance of more middle- 
anxiety Ss was impaired than improved un- 
der encouragement (eight to six). 

The effect of encouragement on middle- 
anxiety Ss, it should be emphasized, is of 
major importance. By virtue of their selec- 
tion from the more heavily populated middle 
of the distribution, such Ss represent a larger 
proportion of the total population than do Ss 
selected from either tail of the distribution. 
If encouragement has the effect of producing 
improved performance in some middle-anxiety 
Ss and impaired performance in others, the 
latter result would reflect an adverse influ- 
ence upon a relatively large number of Ss. 
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Studies of anxiety (5, 13, 22) and stress 
(10, 11, 15) indicate that a high level of 
anxiety, whether existent or induced in Ss, 
generally brings about impaired performance, 
but occasionally causes improvement. The 
level, though high, may in some Ss not ex- 
ceed the point of maximum efficiency. An 
increase in errors without a decrease in total 
score has been found frequently. This latter 
finding appears corroborated in the present 
study, although at a nonsignificant level. 
While the encouraged group of high-anxiety 
Ss obtained a better average score than the 
nonencouraged group, it made more errors. 
Corroborative though nonsignificant, too, is 
the present finding that more high-anxiety 
Ss did worse under encouragement than better 
(eight to five). 

The inconclusive findings of this study may 
be due to a weakness in the experimental de- 
sign, for the Ss’ familiarity with the Mac- 
Quarrie test on taking it the second time 
may have reduced their anxiety and, in turn, 
the possible differential effects of encourage- 
ment on their performance. 

The problem approached in this study 
seems worthy of further investigation. If en- 
couragement produces higher scores for some 
testees, to that extent it may add to accuracy 
of measurement, since accuracy of measure- 
ment depends in part upon testees’ perform- 
ing maximally. If encouragement yields 
lower scores for some testees, however, to 
that extent it would seem to detract from ac- 
curacy of measurement. 

Also of importance is the possible effect 
upon those testees whose scores may be low- 
ered as a result of encouragement. If, as 
both experimentation and theorization have 
indicated, encouragement tends to heighten 
self-esteem involvement in the testing situa- 
tion, Ss with unwarrantedly low scores may 
suffer an unwarranted blow to their self- 
esteem. Another danger is that such Ss may 
acquire a distorted notion of that segment of 
themselves which has been measured. Such 
distortion may do damage to the realistic 
self-evaluation considered to be one of the 
goals of counseling and psychotherapy. 

Should further investigation confirm these 
possibilities, caution would seem to be indi- 
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cated in the use of encouragement in indi- 
vidual psychological testing. Its blanket use 
might preferably be superseded by its selec- 
tive use, once an adequate criterion for selec- 
tion has been discovered or developed. 


Summary 


To determine the effect of encouragement 
on the individual test performance of Ss with 
varying amounts of anxiety, two anxiety 
scales were first administered to a sample of 
college students. From the combined dis- 
tribution of anxiety scores, three groups of 
Ss were selected and designated low-, me- 
dium-, and high-anxiety. To each S$ was in- 
dividually administered the MacQuarrie Test 
for Mechanical Ability, no comments being 
made by the examiner. On the basis of 
scores on this test, each anxiety category was 
divided into two equated groups. Six weeks 
later the test was again individually adminis- 
tered to each S, this time encouraging com- 
ments being offered between subtests to one 
group in each category but not to the other. 

Two-tailed ¢ tests revealed only one sig- 
nificant finding: the performance of the low- 
anxiety Ss displayed increased variability un- 
der encouragement. The possible pertinence 
of nonsignificant trends to previous experi- 
mental findings and to theoretical considera- 
tions was discussed. The disadvantage to 
the design of the Ss’ taking the same test 
twice was mentioned. It was suggested that 
further investigation of the problem is mer- 
ited, so that testees whose performance is 
adversely affected by encouragement may 
eventually be detected in advance and tested 
without encouraging comments. 
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The problem of disciplinary action in in- 
dustry has long plagued supervision at all 
levels, especially at the first line. Although 
there are many approaches to the problem, 
they can be roughly classified into two main 
types: judicial and human relations. The 
judicial approach is characterized by an at- 
tempt to determine the rightness or wrong- 
ness of an employee's actions in a particular 
situation. If the worker was “wrong,” the 
supervisor metes out the predetermined pun- 
ishment. The emphasis is on the solution to 
the immediate problem rather than on the 
possible consequences of the decision. Get- 
ting the facts, screening out opinions, and 
finally weighing the evidence are important 
steps in the judicial approach. 

The Auman relations approach is charac- 
terized by an emphasis on problem solving. 
The question of rightness or wrongness of be- 
havior is subordinate to the question of “How 
can I encourage this worker to perform in a 
desirable manner?” As in most problem solv- 
ing, the supervisor’s behavior is characterized 
by flexibility and adaptiveness, with the re- 
sult that a variety of solutions may be fol- 
lowed on different occasions in gaining the 
same objective. 

The study reported uses a role-playing case 
to determine how supervisors behave in a 
situation involving a disciplinary problem. 
Multiple role playing (1, 2) is used in order 
to permit a comparison of outcomes reached 
by different participants under identical test 
conditions. 


Method 
Subjects 


This study was conducted during the Foremen’s 
Conference at the University of Michigan (April, 
1954). Supervisors, representing a wide variety of 
industries and several levels of management, partici- 
pated. The program was repeated on two successive 
days and data were obtained from over 500 indi- 
viduals. 


Procedure for Role Playing 


After hearing a lecture on the topic of attitudes 
and how to deal more effectively with them, the 
audience was divided into 12 workshops averaging 
approximately 42 men, each being conducted by a 
trained conference leader. In dividing the audience 
into workshop groups, care was taken to see that 
two men from one company would not be in the 
same group. 

When the workshops had convened in their sepa- 
rate rooms, each conference leader informed his 
group that they would participate in a case study 
called the “No Smoking” problem. Since this case 
involved three persons, the men were asked to form 
three-man groups so that many sets of persons 
could study the case independently. A little time 
was then spent to give the men a general idea of 
the role-playing approach to case studies. The con- 
ference leader then read some general instructions 
describing the job and the working conditions, and 
also mentioned the fact that the foreman had just 
laid off a worker for a period of three days for vio- 
lating the company smoking rule. Separate instruc- 
tions were supplied for the part each member of the 
group was to play. These were the foreman, the 
worker, and the union steward. The worker was 
not involved in the role playing, but was present 
only to observe so that he could later pass judg- 
ment on his satisfaction with the outcome. It was 
the steward’s objective to get the foreman to re- 
verse his decision, and it was the interview between 
the foreman and the steward that was role played 
The situation described in the roles made it clear 
that a violation had occurred, that the worker knew 
he was violating a rule, and that there was a spe- 
cific penalty. However, the worker felt he could 
not afford the layoff, and because the steward re- 
garded the employee to be a conscientious worker 
who sneaks fewer smokes than others, he was willing 
to make an issue out of the incident. After 20 min- 
utes of interaction, role playing was terminated and 
preparations for analysis and discussion were made 


Procedure for Obtaining Data 


Data on the following points were collected from 
one three-man team at a time: 

1. The solution or decision reached. 

2. The type of interaction between steward and 
foreman. Three classifications were described: (a) 
argument (each person presents his side of the case 
and appears unsympathetic with the other’s side) ; 
(b) problem-solving discussion (each tries to under- 
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stand the situation of the other and both proceed to 
work out a solution to the conflict in their situa- 
tions); (c) intermediate type of interaction. Par- 
ticipants always reached agreement on the classifica- 
tion. 

3. Satisfaction with the solution. Each partici- 
pant reported his feeling about the decision. 

4. Worker’s response to three questions: (a) Were 
you satisfied with the steward’s defense of you? 
(63 Will you vote for this steward at the next un- 
ion election? (c) Will your work suffer because of 
the way your case was handled? 

5. Steward’s intention to file a grievance. 


Results 


The outcomes of the discussions have been 
grouped into three general classifications, as 
follows: (a) no decision, (6) full layoff, and 
(c) adjusted solutions. The “no decision” 
classification includes some incomplete dis- 
cussions but most of them were deadlocks in 
which each refused to give in, but the fore- 
man nevertheless was reluctant to carry out 
the penalty without the steward’s support. 
The “full layoff” classification means that the 
foreman carried out the penalty as pre- 
scribed by the company rule. This was a 


three-day layoff, and since there was no ques- 
tion about the violation the “judicial” 


ap- 
proach called for this solution. The “ad- 
justed” classification includes all cases in 
which the foreman made some adjustment. 
Solutions in which an agreement is sought 
stem from the “human relations” approach. 
These ranged from reducing the penalty to 
overlooking the incident. Nearly half of the 
adjusted solutions were agreements to reduce 
the three-day layoff to a reprimand or warn- 
ing. In the case of adjusted solutions the 
steward usually agreed to support the no- 
smoking rule in the future. 

The results are shown in Table 1. It is 
important to note that 52%, or better than 
half of the foremen, did not follow the letter 
of the rule, despite the fact that the rule was 
clear and had no provision for leniency. In- 
cluded in this 52% are 9% decisions to con- 
sult higher management in order to get per- 
mission to make an exception or to let higher 
management make the decision, but the other 
43% of the foremen took it upon themselves 
to follow the “human relations” approach. 
A little more than a third (35%) of the fore- 


Mater and Lee E. Danielson 


Table 1 


Satisfaction with Various Solutions 


No 
Decision Full 
Reached I ayoft 


Instances in 172 tie 23 3 97 ’) 60 (35%) ) 99 9 (52: 


Adjust 
ment 


Satisfaction with ara 
tion (%) 
Foreman 
Steward 
Worker 
Interaction (%) 
Argument 
Intermediate 
Problem solving 
Worker’s reaction (%) 
Satisfied with defense 
Will vote for steward 
Will reduce produc- 
tion 
Steward will file gnev 
ance (%) 


* Workers cannot express their satis faction with the outcom 
because they are uncertain as to whether they will be ‘la a ff 


men had the courage, the confidence, or what- 
ever trait is needed, to carry out the letter of 
the law and avoid the possible charge of dis- 
criminatory practice. Only 13% failed to 
settle the problem in the allotted time. The 
consequences of these three types of out- 
comes are apparent from the remainder of 
the table. 

Satisfaction for foreman, steward, and 
worker increase together and is greatest for 
adjusted solutions and least for cases in 
which no decision is reached. For each per- 
son involved, the difference in satisfaction for 
full layoff and adjusted solutions is signifi- 
cant at better than the 1% level of confi- 
dence. 

The type of solution reached also is re- 
lated to the type of discussion that occurred. 
The problem-solving type of discussion was 
associated primarily with adjusted solutions, 
whereas argumentative approaches led to 
deadlocks and judicial solutions. A_ chi- 
square test for the relationship between type 
of meeting and type of outcome is significant 
at the 1% level of confidence. 
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The third set of comparisons, called worker 
reactions, shows the workers to be most often 
satisfied with their stewards when an adjust- 
ment results, and least often satisfied when 
given a full layoff. The percentages of work- 
ers who are satisfied with the steward, and 
percentages of those who will vote for him 
again when an adjustment results are each 
significantly different (at the 1% level of 
confidence) from the percentages obtained on 
these two items with the other two outcomes. 
The adjusted solution also resulted in only 
6% of the workers saying that they would 
reduce production while 40 and 43%, re- 
spectively (both significant at the 1% level 
of confidence), indicated reduced production 
for “full layoffs” and “failure to reach a de- 
cision.” 

A comparison of the number of stewards 
who will file grievances may be used as a 
measure of the extent to which various solu- 
tions actually failed to lead to a final settle- 
ment of the dispute. Table 1 clearly shows 
that only when an adjustment is reached can 
there be any degree of confidence that the 
problem has been settled. Both of the other 
two outcomes led to more than 40% of the 
stewards saying that they intended to file 
grievances, and these frequencies are signifi- 


cantly different (at less than the 1% level of 
confidence) from the 2% figure obtained 
when an adjustment is reached. 

A breakdown of the adjusted solutions that 
were lumped together in the last column of 
Table 1 include the following: 


1. Reduced layoff (either two days or one). 

2. Forgiven (no layoff and no reprimand 
stated, but a warning may be implied). 

3. Warning and reprimand (it is made clear 
that the employee is at fault and that the 
next person violating the rule will be laid 
off; violation may be entered in personnel 
record). 

4. Consulting 
tain permission 
change the rule 
in the case). 

5. Consulting workers (to determine what 
should be done in this case and be willing to 
abide by decision). 

6. Other (refers to the few that cannot be 
classified in above categories). 

Table 2 shows the satisfactions, the type 
of interaction, the worker’s reaction, and the 
steward’s follow-up for each of these adjusted 
solutions. Since the .cases in some of the 
categories are rather small, the relationships 
obtained can merely be suggestive. 


higher management (to ob- 
to make an exception, to 
or make a special decision 


Table 2 


Satisfaction with Different Adjustments 


Reduced 
Layoff 


Instances in 89 groups 


Satisfaction with solution (%) 
Foreman 
Steward 
Worker 
Interaction (%) 
Argument 
Intermediate 


Problem solving 
Worker’s reaction (%) 
Satisfied with defense 


Will vote for steward 
Will reduce production 
Steward will file grievance (%) 


Forgiven 


Consult 
Manage- Consult 
ment Workers 


Warning 
Repri- 
mand 


39 15 6 
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Discussion 


In another investigation dealing with a 
disciplinary problem the writers ' found fore- 
men reluctant to enforce a safety rule as evi- 
denced by the fact that only 7% of them 
laid off the man who violated the rule. The 
foreman gave two major reasons for not fol- 
lowing the letter of the rule: (a) the penalty 
of a three-week layoff was too strict, and (5) 
the foremen used in the case were not sure 
that a violation had occurred. The fact that 
45% of the workers admitted the violation 
indicates that the second reason given was 
more of an excuse than a reason for not ap- 
plying the penalty. 

In the present study the rule has been 
made less strict and the doubt as to whether 
a violation has occurred has been removed. 
Although the prescribed penalty is now ap- 
plied in 35° of the instances, 52% of the 
foremen fail to follow the letter of the law 
and instead either reduce or omit the penalty. 
The other 13%. are unable to resolve the 


problem in the time allowed. Regardless of 
how one feels about rules, the fact remains 
that the persons who are intrusted to apply 


them are not doing so to the extent that is 
ideally supposed. If a rule is used as a for- 
mula to treat everyone in the same way, it is 
not accomplishing this objective. If fair treat- 
ment is a sound management objective, it is 
reasonable to question whether fairness can 
be legislated in a company. Different fore- 
men permit their own feelings and attitudes 
to determine how a violator will be treated, 
and they apparently accept this as a practical 
thing to do. Thus human factors influence 
the methods of dealing with people as soon 
as the authority figure comes face to face 
with the person who is to be punished. 

The present experimental data demonstrate 
not only that foremen are inclined to use 
what we have called the human relations ap- 
proach, but that this approach is more likely 
to produce desirable results and satisfaction 
for all concerned than is the judicial ap- 
proach, which is characterized by a consid- 


1L. E. Danielson and N. R. F. Maier. Super- 
visory problems in decision making: an experimental 
study of safety. Unpublished manuscript. 


eration of the factual evidence and the de- 
termination of innocence or guilt. 

What are the forces causing foremen to 
shy away from following company rules and 
avoiding the judicial approach? Group dis- 
cussions in connection with problems of this 
kind reveal two kinds of reasons. 

1. Foremen believe they can get better co- 
operation from their men if they treat them 
with consideration. Most foremen today be- 
lieve that a layoff wili not solve the problem 
if one uses case studies to present a particu- 
lar situation. In general discussions, how- 
ever, they will argue in favor of rules and 
the need for consistency. 

2. Foremen are reluctant to risk a walk- 
out. It is their belief that if a grievance re- 
sults from disciplinary action, the company 
is likely to reverse the foreman’s decision. In 
any case they do not cherish the emotional 
problems involved in grievance proceedings, 
and even if backed up by the company, they 
fear they lose out in the estimation of their 
superiors if they figure in the trouble that 
has been caused by grievance proceedings. 

The first of these factors is emotional in 
nature in that there is a failure to accept the 
job of punishing a good employee. The sec- 
ond reason is more intellectual in nature and 
perhaps is a conclusion that has been reached 
through bitter experience. Foremen discover 
that they must learn to get along both with 
their superiors and with the union repre- 
sentatives. They have no final authority or 
power since the union has taken the big stick 
from them, and as a consequence they must 
use their wits. The judicial approach as- 
sumes that force is available and this is some- 
what unrealistic in our present society. The 
human relations approach respects the feel- 
ings of people and these feelings become a 
factor in reaching decisions. In the present 
study the foreman had all the objective facts 
on his side and the steward had little more 
than “feelings” to support his position. The 
“feeling” side of the issue was strong enough 
to win adjusted solutions in 52% of the cases. 
That these adjusted solutions were not vic- 
tories achieved by the steward through threat 
of a walkout is indicated by the fact that 
satisfaction with the outcome was highest 
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for foremen when adjusted solutions were 
reached. No foreman experienced the ad- 
justed solution as a defeat. 

The experimental findings support the con- 
clusion that rules and penalties can no longer 
be regarded as effective procedures for con- 
trolling behavior or maintaining discipline. 
This means that new ways of controlling be- 
havior must be found. These must allow the 
foreman sufficient freedom to act so that he 
is not restrained by rules and can use his 
discretion. However, if the human factor is 
increased by moving away from judicial to 
the human relations approach, it means that 
foremen must be selected and trained to use 
human relations properly. In other words, 
foreman training is an essential part of a 
motivation program utilizing positive incen- 
tives. 

Summary 


In order to study the kinds of issues in- 
volved in a practical disciplinary problem, 
industrial supervisors were placed in a role- 
playing situation requiring that disciplinary 
action be taken. One person played the part 
of the supervisor, another the part of the un- 
ion steward, and a third person identified 
himself with the worker who was to be disci- 
plined. The background of the case made 
the violation of a no-smoking rule clear-cut 
so that a three-day layoff was in order. The 
steward intervened, however, and his func- 
tion was to get the foreman to change his 
decision. 

The results obtained are as follows: 


1. A total of 89 (52%) foremen altered 
their decisions and reached adjusted solu- 
tions. They tended to follow the human re- 
lations approach. 

2. A total of 60 (35%) foremen persisted 
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in their decisions and were governed by the 
fact that the worker was guilty. They fol- 
lowed the judicial approach. 

3. The remaining 23 foremen (13%) failed 
to settle the matter in the time allowed. 
They were reluctant to change their decisions 
and also hesitated to take a stand. 

4. The human relations approach was more 
successful than the judicial approach in that 
(a) satisfaction for foremen, stewards, and 
workers was greater; (b) the interview was 
more of a problem-solving type discussion 
than an argument; (c) the worker was more 
inclined to be satisfied with the steward; (d) 
the worker was less inclined to reduce his 
future production; and (e) the steward was 
less inclined to file a grievance. 

5. Adjusted solutions varied in nature, but 
more than half of them omitted the three- 
day penalty altogether. 


It is concluded that rules hamper the su- 
pervisor and place him in the awkward po- 
sition of either showing disrespect for higher 
management or a disregard for the feelings 
of his men. New ways in discipline must be 
sought and these require training in human 
relations. Rules can function only when 
power to enforce them exists. Even then 
they do not create positive motivation. In 
the absence of power, foremen must be al- 
lowed and trained to use human relations 
skills. 


Received November 23, 1955. 
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Psychologists long have been concerned 
with the effect of response sets on personality 
test scores. Of particular interest is the set 
leading an individual to respond to specific 
test items according to conscious or uncon- 
scious needs to appear as a superior person. 
In order to assess the presence of this set, 
and to correct scores for its effects, special 
scales have been developed, one of the most 
familiar being K, the suppressor scale for the 
Minnesota Multiphasic Personality Inven- 
tory. Several MMPI scales are corrected for 
such “defensiveness” by the addition of vari- 
ous amounts to raw scores, depending on the 
value of K received by the subject. 

A different approach to the study of this 
response set has been employed by Edwards 
(2). Using an inventory of his own design, 
Edwards had judges rate items on a 9-point 
scale of desirability. The inventory also was 


given as a personality test to another sample. 
The probability that an item would be en- 
dorsed (i.e., answered “true’”’) in the testing 
situation was found to be highly related to 
its judged desirability, the correlation be- 


tween these variables being .87. Edwards 
suggests two factors might underlie this re- 
lationship: (a) traits judged to be desirable 
are just those which are common in the popu- 
lation, and (6) subjects in general attempt 
to give good impressions of themselves when 
taking personality tests. Both factors are 
important in connection with the problem of 
defensiveness. 

The primary aim of the two experiments 
to be reported was to determine the extent 
to which the relationship discovered by Ed- 
wards holds for other more widely used in- 
ventories, specifically, the MMPI.  Experi- 
ment I, in addition, was designed so that a 
scale with a maximal K correction, the Sc 
(Schizophrenia) scale, and a scale without a 
K correction, the D (Depression) scale, could 
be compared with respect to the relationship 


between social desirability and probability of 
endorsement. Experiment II investigated this 
relationship in the K scale itself, in order to 
obtain evidence regarding the validity of 
that dimension. 


Experiment I: Sc and D 


Method. Random samples of 25 of the 60 D 
items and 32 of the 78 Sc items were reproduced in 
mimeograph form, four items being common to both 
samples. These items constitute slightly more than 
40% of the two populations of scale items. 

The selected items were rated on a 9-point scale 
of social desirability by 43 male and 44 female un- 
dergraduates at Michigan State University. The 
judges were instructed to make their ratings in ac- 
cordance with their opinion as to how “people in 
general” felt about the attitudes expressed by the 
items. The social desirability of an item then could 
be expressed by its median rating on this scale.? 

Probability of endorsement was obtained by count- 
ing the pertinent responses of 64 male and 42 fe- 
male undergraduates who had taken the MMPI in 
connection with an earlier study at Michigan State 
The probability of endorsement of an item is the 
decimal ratio of the number of subjects answering 
“true” to the total number answering the item. 
Thus, the item rated the most socially undesirable, 
“Most of the time I wish I were dead,” had a prob- 
ability of endorsement of .01. The neutral item, “I 
am easily awakened by noise,” had an endorsement 
value of .35, while the item judged most desirable 
of all, “I loved my mother,” had a probability of 
endorsement of .93. 


Results. Most of the items received unde- 
sirable ratings. Of the Sc items, there were 
24 rated as 4.0 or less (undesirable), 4 be- 
tween 4.0 and 6.0, and 4 as 6.0 or more (de- 
sirable). Of the D items, 13 were rated as 
4.0 or less, 7 had ratings between 4.0 and 6.0, 
and 5 received ratings of 6.0 or more. The 
Sc scale has relatively fewer neutral items 
than D. This may be one reason why a cor- 


1 Edwards converted median ratings into values on 
an equal-interval scale by applying the method of 
successive categories (1). Scale values in the pres- 
ent study were so highly correlated with rated values 
that the latter measures are used for purposes of 
simplicity. 
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rection is applied for Sc but not for D. In 
this connection, Wiener (5) notes that the 
Se scale lacks “subtle” items. 

Two kinds of relationships are worth de- 
scribing. The first is that between social 
desirability and probability of endorsement. 
The product-moment correlation between these 
variables is .82 for the D items and .89 for 
the Sc items.* Both coefficients are of the 
same order of magnitude as the correlation 
reported by Edwards for his inventory. 

Fiducial limits for these coefficients cannot 
be ascertained by the usual methods, since 
both item samples were drawn from small, 
finite populations. Possibly more important 
as a source of uncertainty is the number of 
judges and number of subjects employed in 
the study. Increasing these groups in size 
undoubtedly would yield more reliable esti- 
mates of the values entering into the com- 
putations, but this fact cannot be expressed 
in terms of the standard error of a correla- 
tion coefficient. 

The foregoing indicates that a person who 
answers items in the socially desirable man- 
ner will perform much as does the average 
college student. This observation raises the 
question of the relationship of social desir- 
ability to the manner in which items are 
scored on the keys. By classifying “true” re- 
sponses into the dichotomy “keyed-unkeyed,” 
a point-biserial correlation may be computed 
using social desirability as the continuous 
variable. For the Sc items, this coefficient is 
.84; for the D items, it is .58. Thus, it ap- 
pears that the scoring of the Sc scale is more 
highly related to social desirability than is 
the case with D. On both scales, however, a 
tendency to answer in the socially desirable 
manner will result in lowered raw scores. 


Experiment II: K 


Method. In the second study, the procedure was 
similar to that used in Experiment I. A list of 29 
of the 30 K items was mimeographed; the omission 
of the thirtieth item was not noted until the data 
had been collected. Five non-K items from the ear- 
lier study were included; these plus six K items also 

*When probability of endorsement is estimated 
from data obtained by Hathaway on a normative 
sample of 152 college males and 113 college females, 
the correlations are .92 for the Sc items and .86 for 
the D items. 
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present in the D and Sc samples give an overlap of 
11 items between the two lists. 

A new sample of judges was employed: 43 male 
and 34 female undergraduates at Michigan State. 
The instructions were those used in Experiment I. 
Again, social desirability is represented by the me- 
dian rating given items by the judges. 

Probability of endorsement is based on counts of 
responses in MMPI records accumulated in a sepa- 
rate experiment with students in introductory psy- 
chology classes. The records of 63 males and 37 
females were used to obtain probability of endorse- 
ment in a “typical” college sample. Records of 32 
males and 19 females having raw K scores greater 
than 19 (standard scores of 64 or more) were used 
to obtain endorsement values in a “High-K” group, 
while probability of endorsement in an “Average-K” 
group was secured from the records of 42 males and 
29 females having raw K scores ranging from 11 to 
14 (standard scores of 48-53). 

Results. An. indication of the stability of 
ratings of social desirability may be had by 
comparing the judgments given to the 11 
items common to both experiments. Table 1 
presents the necessary information. Differ- 
ences are small, except in the case of Item 
142, which receives a slightly undesirable 
rating in Experiment I but a clearly undesir- 
able rating in Experiment II. On the whele, 
the judgments appear to be stable. 

Of the 29 K items studied, 16 received 
median desirability ratings of 4.0 or less (un- 
desirable), 5 had median ratings of 6.0 or 
more (desirable), while the remaining 8 had 
ratings falling between 4.0 and 6.0. The ma- 
jority of the items, in other words, express 
attitudes judged to be undesirable. 


Table 1 


Median Social Desirability Ratings of Items Common 
to Lists Used in Both Experiments 
MMPI 
Booklet 
Number 


Median Rating 


D-Sc List 


K List 

5 5.0 4.8 
9 7.6 6.5 
39 2.6 2.6 
89 4.4 4.7 
3.8 3.3 

142 4.0 2.6 

8.6 8.5 
8.8 8.8 
2.5 2.2 
4.3 4.0 
1.2 1.1 
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An estimate of the relationship between 
social desirability and probability of endorse- 
ment was obtained by correlating these vari- 
ables using probability of endorsement values 
from the typical college group. The product- 
moment correlation is .50.* A positive rela- 
tionship exists, but its magnitude is consider- 
ably less than those found to hold for the D 
and Sc items used in Experiment I. 

The existence of this relationship might 
seem to indicate that K scores may be low- 
ered if the response set is operating, but an- 
other possibility is present. In Experiment I 
it might be assumed that the incidence of 
schizophrenia and depression was negligible 
among the college students whose responses 
determined the endorsement values for the 
Sc and D items. It is quite likely, however, 
that the sample of students, whose responses 
yielded endorsement values for the K items, 
included a considerable number who had the 
set to answer items according to their social 
desirability. As far as the records of stu- 
dents lacking the set are concerned, the cor- 
relation between social desirability and prob- 
ability of endorsement might be negligible, 
but if an appreciable number of students have 
that set, their responses would create a posi- 
tive correlation between the two variables. 
Using K to indicate the presence of this re- 
sponse set, it is possible to compare the cor- 
relation between desirability and endorsement 
in a sample apparently lacking the set with 
the correlation obtained in a sample in which 
the set presumably is present. 

The set, insofar as it is measured by K, is 
absent in the “Average-K” sample. In this 
group, the correlation between desirability and 
endorsement is .38. The set supposedly is 
present in the “High-K” group. In this sam- 
ple, the correlation is .66. From these data, 
therefore, it appears that individuals who re- 
ceive high K scores have a greater. tendency 
to respond to items according to their social 
desirability than is the case with subjects ob- 
taining lower K scores. The validity of K as 
a measure of the response set is demonstrated 
by this evidence. 

3 When endorsement is based on Hathaway’s data, 
the correlation is .59. 
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If a scale is to measure the presence of the 
set, its scoring should be consistent with the 
social desirability of its items. Other things 
being equal, endorsement of desirable items 
and denial of undesirable items should char- 
acterize the test performance of defensive in- 
dividuals. The scoring of the K items, how- 
ever, is not entirely consistent with this 
scheme. All 29 K items in this experiment 
are keyed so that denial indicates the pres- 
ence of the set. There are five items (Book- 
let No. 160, 272, 296, 461, and 502) that ob- 
tain desirability ratings of 6.0 or more, yet 
answering these items “false” (i.e., denying 
desirable qualities) is scored as indicating 
the presence of the set. 

This being the case, it might be hypothe- 
sized that the best separation between indi- 
viduals having the set and those lacking it 
would be produced by items whose scoring is 
consistent with their judged social desirabil- 
ity, in this instance the relatively undesirable 
K items. A rough test of this hypothesis 
is obtained by correlating social desirability 
with the differences in endorsement given by 
the “Average-K” and ‘“High-K” samples to 
the individual items. The difference for each 
item is obtained by subtracting its “High-K” 
probability of endorsement from its ‘““Average- 
K” value. The product-moment correlation 
between these differences and social desir- 
ability ratings is — .49, a result indicating 
that the best separation between the two 
groups tends to be provided by K_ items 
judged to be relatively undesirable. It is just 
these items that are scored in a manner con- 
sistent with their social desirability ratings. 


Discussion 


The present study demonstrates that the 
relationship between social desirability and 
endorsement of items reported by Edwards 
(2) is not confined to his inventory. Indeed, 
one might venture to speculate that it would 
hold for most conventional personality tests. 

With such high correlations as exist be- 
tween desirability and endorsement in the D 
and Sc items, the way clearly is open for mis- 
representation by subjects. A number of al- 
ternatives are available for dealing with the 
problem. Edwards suggests as a remedy the 
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pairing of items on the basis of social desir- 
ability and the institution of a forced-choice 
technique of answering. The effectiveness of 
this ipsative approach is not yet reported. 

Another possibility may be useful and does 
not entail abandoning the traditional form of 
the personality inventory. Preliminary judg- 
ing of items will reveal those at the extremes 
of the social desirability dimension. To the 
extent that these can be discarded and rela- 
tively neutral items substituted in their place, 
the scale which results should be less vulner- 
able to manipulation. Neutral items may be 
few and far between, but the more there are, 
the better the scale will be. 

A third approach is to censtruct a sup- 
pressor scale which can be used to correct for 
the effects of the response set. Such a scale 
should consist of items heterogeneous in so- 
cial desirability to which probability of en- 
dorsement is unrelated in a population not 
having the set. To some extent, the K scale 
of the MMPI approximates this desired state 
of affairs. Its items vary in social desirabil- 
ity, and in a sample of individuals who ob- 
tained “average” K_ scores, the correlation 


between social desirability and probability of 


endorsement is relatively low. This is not 
the case with subjects obtaining “high” K 
scores; with these individuals the two vari- 
ables have a relatively high correlation. The 
scoring of the K scale, however, is not always 
consistent with the social desirability of its 
items. With a change in the scoring of five 
items, a somewhat different group of “highs” 
would have been discovered, and for them 
the correlation between social desirability and 
probability of endorsement might be even 
higher than was found using the present scor- 
ing key. 

If the K scale measures the set to respond 
according to the social desirability of the 
traits expressed by the items, it may be asked 
why certain MMPI scales and not others are 
corrected by its use—for example, why K is 
added to Sc and not to D. In the present 
study, it appears that D is vulnerable to de- 
fensiveness, although not to the extent that is 
true for Sc. 

The MMPI procedure was based on em- 
pirical evidence. Meehl and Hathaway (4) 


found that the addition of K to the Sc scale 
reduced the number of “false negatives,” but 
no similar reduction was accomplished when 
K was used to correct D. A partial cause for 
this failure in the case of D lies in purely 
psychometric factors. There is an appreci- 
able item overlap between the K and D 
scales, 8 of the 30 K items being scored for 
D as well. When the scoring of an item on 
K is the opposite of its scoring on D, the net 
effect will be to cancel the item when K is 
added to D. When the common items are 
scored in the same manner for K as for D, 
the effect of adding K to D is to increase the 
weight given these items. In a scale as long 
as D (60 items), weighting items will not 
produce a useful increase in efficiency (3, p. 
447). 

The fact that K is not used to correct cer- 
tain MMPI scales, therefore, does not indi- 
cate that scores on these dimensions cannot 
be influenced by a set to respond according 
to the social desirability of the items. It 
does not, on the other hand, demonstrate 
that the K scale fails to measure the presence 
of the set. What is suggested is that a re- 
vision of the K scale, which might involve 
the elimination of item overlap, a change in 
the scoring of certain items, and the addition 
of new items having the formal properties of 
suppressors could produce increased validity 
in the presently uncorrected MMPI scales. 


Summary 


Two experiments were described in which 
responses to items from three MMPI scales 
were related to the judged social desirability 
of these items. 

In Experiment I, ratings of the social de- 
sirability of random samples of 25 D and 32 
Sc items were correlated with the probabili- 
ties that the items would be endorsed when 
the MMPI was used as a personality test. 
The correlations between social desirability 
and probability of endorsement are .82 for 
the D items, and .89 for the Sc items. The 
scoring of both scales was found to be sys- 
tematically related to the social desirability 
of the items. 

In Experiment IT, social desirability ratings 
and probabilities of endorsement were corre- 
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lated using a sample of 29 of the 30 K items. 
In a typical college sample, these variables 
correlate .50. In an “Average-K” group, the 
correlation is .38, while in a “High-K” sample 
the correlation is .66. The results were inter- 
preted as demonstrating the validity of K as 
a measure of the set to respond to items in 
terms of their social desirability. It was 
pointed out, however, that the scoring of sev- 
eral K items was not consistent with their 
judged social desirability. Suggestions were 
made regarding possible improvements in the 
K scale and its application to scales presently 
uncorrected for by K. 


Received November 16, 1955. 
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The Relationship Between Attitude Toward the Army and 
the Acceptance Accorded QM Items of Issue’ 


James F. Parker, Jr. and Ray C. Hackman 


University of Maryland 


The U. S. Army realizes the importance of 
issuing clothing and personal equipment to 
the soldier that will meet the subjective stand- 
ards set by the soldier, and therefore is faced 
with a problem similar to the consumer re- 
search problem in industry. The preferences 
of the soldiers for new articles proposed as 
future items of issue, and the acceptance ac- 
corded them, are assumed to play a large part 
in determining the extent to which the sol- 
dier will retain an article and use it in the 
prescribed manner. The Army maintains 
several centers which conduct extensive tests 
to determine both the physical characteristics 
of new articles of clothing and personal equip- 
ment and the subjective acceptance which 
will be accorded them. In testing for the 
relative acceptability of items of a given 
class the Army frequently uses the panel 
method with several large groups participat- 
ing in the test. Reactions to the articles and 
evaluations of them are solicited from the 
panel members by means of questionnaires. 


Problem 


In part of the tests conducted by the Sur- 
vey Division of the Quartermaster Research 
and Development Field Evaluation Agency 
at Fort Lee, Virginia, subjects are selected 
from a pool of volunteers to fit whatever sizes 
are available in the articles to be tested. The 
pool of volunteers is usually large since the 
articles used may sometimes be retained at 
the conclusion of the test. A potential source 
of sampling bias exists, however, since the 
soldiers chosen to be panel members consti- 
tute a very select sample and, as such, may 


1 This study was conducted as a part of Contract 
DA 44-109qm-129 let by the Office of the Quarter- 
master General to the University of Maryland. This 
paper was reported in part at the annual meeting of 
the Eastern Psychological Association, April, 1953. 
Opinions expressed herein are those of the authors 
and do not necessarily represent those of the Office 
of the Quartermaster General or the U. S. Army. 
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not be representative of the entire Army 
population. Since these men are volunteers 
who participate in the test in addition to 
their other duties, it would appear likely 
that they are quite favorably disposed to- 
ward the Army. The principal hypothesis of 
this study was that soldiers differing in their 
general attitude toward the Army have a dif- 
ferent regard for items bearing the Army 
label and that results obtained with these 
soldiers are not applicable to the entire Army. 
In addition, it was proposed to test the ef- 
fects of three other variables on soldier ac- 
ceptance: length of service in the Army, level 
of education of the soldier, and the size of 
the city or town in which the soldier had 
lived for the major portion of his life. 


Procedure 


Four hundred enlisted men at Fort Lee were ini- 
tially selected so as to represent as wide a range of 
service experience as possible. Examination of the 
biographical information on these men showed that 
the most meaningful categories into which these men 
could be grouped were as follows: basic trainees 
with fewer than eight weeks in the Army (Group 
I) ; soldiers with more than six months and less than 
two years of experience in the Army who had never 
been overseas (Group II); and combat veterans, 
each of whom had been in the Army for more than 
two years and had been awarded at least one battle 
star (Group III). 

Each of these groups was ordered with respect to 
general attitude toward the Army by means of a 
Guttman scale analysis, using a set of items re- 
ported previously (3). These items had been found 
to be scalable and in a pretest on a group of Na- 
tional Guardsmen were again found to be scalable. 
In order that the attitude groups used in the final 
analysis might be as different as possible, within 
each of the experience categories (Groups I, II, and 
III), only the top eight, those very favorably dis- 
posed toward the Army, a middle eight who were 
relatively neutral, and the bottom eight, who had a 
decided dislike for the Army, were used. Thus 
there were 24 cases in each of Groups I, II, and ITI, 
or a total of 72 cases in the complete analysis. 

To determine the effects of these variables on ac- 
ceptance, an instrument was devised to yield an ob- 
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jective acceptance score for each individual. As a 
first step in accomplishing this, all similar attitude 
groups were combined. That is to say, the top 
eight from each of Groups I, II, and III were com- 
bined to form a single favorable group and simi- 
larly for each of the other attitude groups. These 
resulting groups were essentially alike in all respects 
except attitude and were designated the F, M, and 
U groups, standing for favorable, middle or neutral, 
and unfavorable. 

Next, a preference scale of Quartermaster items 
ordered along a metric was established. Pretesting 
suggested that the method of paired comparisons 
could not be used for this purpose since it appeared 
to present an unrealistic task to the subject (S). 
Accordingly, the following system was devised: The 
soldier was given a list of 14 Quartermaster items, 
each being quite familiar to him. He was first asked 
to select the three articles he considered best and 
the three he considered poorest and to give a rea- 
son for the selection of each. The reasons were 
asked for primarily as a device for forcing a more 
critical evaluation of the items. Second, the S was 
asked to rank, from best to worst, the top three 
and the bottom three items. For each S, then, his 
rankings of items one, two, and three, and twelve, 
thirteen, and fourteen were obtained with no knowl- 
edge of his rankings of the eight middle items. A 
rank of 7.5 was arbitrarily assigned to each article 
not named. A preference scale for these 14 items 
was established by summing the rankings over all 
Ss and using Guilford’s method (1) for achieving a 
metric from ranked data. In Table 1 are shown the 
preference scales established by each of the three 
attitude groups. 

The last section of the instrument consisted of 


that part from which the individual’s acceptance 
score was actually obtained. In this section, each 
S evaluated separately each article in the list of 14 
QM items used in setting up the preference scale. 
If, in his opinion, an item was generally superior to 
its civilian counterpart, he indicated his preference 
for the Army article. If, however, he could think 
of a similar civilian article which he considered to 
be, in most respects, better than the Army article. 
he indicated his preference for the civilian article. 
This appeared to be a realistic task and should have 
permitted all aspects of item acceptance to operate. 

An objective acceptance score was determined for 
each individual by recording the scale value of each 
article chosen as being superior to its civilian coun- 
terpart and then summing these scale values over al] 
articles chosen. It should be noted that the scale 
values used in obtaining the acceptance scores were 
those which were established by the M group, those 
neutral on the attitude scale, alone, since it was felt 
that scales produced by the other two groups would 
be contaminated by the extremes of attitude operat- 
ing in these groups. 


Results 


These acceptance scores went into the nine 
cells of a two-way classification table. In 
this table one dimension represented the 
three-way categorization based on length of 
service, and the other represented the three- 
way Categorization based on the attitude 
measure. Table 2 shows the analysis of vari- 
ance computed from this table. 


Table 1 


Preference Scales for QM Items Established Through the Rankings of Articles 


Scale 


by the Three Attitude Groups 


M Scale U Scale 


Group 


Boots, russet 
Ike jacket 
Trousers, OD 
Shirt, cotton 
Field jacket 
Blanket, wool 
Undershirt 
Cap, HBT 
Jacket, HBT 
Sweater 
Gloves, wool 
Raincoat 
Necktie 
Overcoat, wool 


Value 
1.83 
1.55 
1.45 
1.28 
1.27 
1.19 
1.13 
1.01 

97 
92 
82 
74 
A9 
.00 


Group 
Boots, russet 
Undershirt 
Treusers, OD 
Shirt, cotton 
Ike jacket 
Jacket, HBT 
Field jacket 
Cap, HBT 
Blanket, wool 
Gloves, wool 
Raincoat 
Sweater 
Necktie 
Overcoat, wool 


* The highest number represents the article most preferred. 


Value 
1.46 
1.14 
1.10 
1.01 

92 
88 
85 
81 
80 


Group 


Boots, russet 
Ike jacket 
Shirt, cotton 
Sweater 
Undershirt 
Blanket, wool 
Field jacket 
Cap, HBT 
Trousers, OD 
Jacket, HBT 
Gloves, wool 
Raincoat 
Necktie 
Overcoat, wool 


Value 


1.44 
1.02 
1.00 
1.00 
1.00 
.93 
91 
80 
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Table 2 


Results of Analysis of Variance of Acceptance Scores 


Mean 
Square F af 
98.7218* 21.199 
6.2524 1.342 
1.9251 
4.6567 


Source df 


<.01 
>.05 


Attitude 2 
Length of Service 2 
Interaction (A X L) 4 
Error 63 


Total 71 


*F (df = 2,60) = 4.98 at the 1% level of significance. 

Examination of the analysis of variance 
table shows the effect of attitude differences 
on acceptance to be highly significant. This 
substantiates the initial hypothesis that there 
is a definite quantitative difference in the ac- 
ceptance which is accorded QM items of 
issue by soldiers representing different por- 
tions of the attitude continuum. Generaliz- 
ing from this, we would expect soldiers who 
have a low regard for the Army to dislike 
items of issue bearing the Army label. This 
finding reaffirms a suspicion that the accept- 
ance of QM items is based in part on things 
other than simply the quality and or utility 
of the items. 

Length of service (as defined) was not a 
significant source of variation. It would 
seem that a long period of time in the Army, 
and the consequent greater amount of experi- 


ence with military equipment, has little ef- 
fect on the way in which the soldier evalu- 
ates the articles of clothing and personal 
equipment he uses and wears. If the initial 
acceptance level of an article is low, mere 
passage of time will not increase its accepta- 
bility for the user. 

No significant interaction effect between 
attitude differences and length of service was 
observed. 

The significance of the relationship be- 
tween level of education and acceptance, and 
between the size of the town in which the 
respondent had previously lived and accept- 
ance, were both tested by means of a coeffi- 
cient of contingency. In both instances it 
was found that virtually no_ relationship 
existed. 

Possibly the most important finding of this 
study was that significant differences exist 
among the preference scales established by 
the three attitude groups. Variances of the 
metric values for each article in the prefer- 
ence scales established by the attitude groups 
were examined by means of an L;, test (2), 
and significant heterogeneity was found. This 
suggests that the relationship of attitude and 
acceptance varies from item to item. Soldiers 
on the favorable end of the attitude con- 
tinuum differed in their judgments as to what 
constitutes good and bad items of issue from 


Table 3 


High and Low Variances for Articles on the Preference Scale 


Items Most Affected 
by Attitude Variance 

1. Trousers, OD 

2. Sweater, high neck 


3. Ike jacket 


.0569 
0411 
.0377 


Items Least Affected 

by Attitude Variance 
.0022 
.0006 
0002 


1. Gloves 
2. Cap, HBT 
3. Shirt, cotton 


Rank-Order Position by Group 





F U 
12* (.40)** (—.06) 
5 (—.13) (.19) 
13 (.49) 10 (.20) 


(.12) 


Rank-Order Position by Group 


F M U 
(—.13) 4 
(.01) 7 
(.21) 12 


(—.16) 
(—.02) 
(.19) 


* High numbers represent high position on the preference scale. 


** Numbers in parentheses are metric values obtained from Guilford’s scaling procedure (1). 


distributed form. 
obtained for that group. 


These values are in normally 


When presented in Table 1 each scale value, for each group, was expressed as a distance from the lowest value 
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soldiers who were unfavorably disposed to- 
ward the Army. Table 3 shows the three ar- 
ticles most affected by attitude differences 
and the three least affected. Note that ar- 
ticles from all positions on the preference 
scales are included in both groups. Particu- 
larly it should be noted that the top group, 
those articles most affected, includes items at 
all levels of acceptance. 

It is also apparent from an inspection of 
this table that it is necessary to attend to 
more than the rank ordering of articles in 
order to evaluate the effect of attitude. For 
example, some items, such as the overcoat 
and necktie, were consistently ranked, on the 
average, as poorest and next to poorest by all 
three groups. These items did not appear in 
the section marked “Items least affected by 
attitude,’ however, because even though the 
group rankings were consistent, differing vari- 
abilities within the groups produced different 
metric positions for the items. Here it should 
again be noted that the metric scores used in 
the L, test for homogeneity of variance were 
in standard score units, prior to each scale 
value being expressed as a distance from the 
lowest scale value in the group. 


Summary 


A questionnaire designed to elicit a general 
‘ attitude toward the Army was administered 
to 400 soldiers at Fort Lee, Virginia. From 
this sample three smaller groups of 24 each 
were selected so as to represent those very 
favorable toward the Army, those relatively 
neutral, and those definitely disposed against 
Army life. In addition these groups were se- 
lected so as to encompass a wide range with 
respect to length of service. These soldiers 
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were then asked to evaluate 14 articles of 
standard QM issue in a manner designed to 
indicate the relative acceptance accorded each 
article. 

Results of the study indicate that differ- 
ences in the general attitude toward the Army 
held by soldiers operate to produce changes 
in the manner in which the soldier evaluates 
the articles he is issued to use and wear. 
This probably influences the results obtained 
in an acceptance testing program. The use 
of random or stratified random sampling in 
selecting acceptance testing subjects should 
replace the use of volunteers. This would 
prove a suitable control for this variable and 
permit generalization of results to the entire 
Army population with a higher degree of va- 
lidity. 

In addition, these results suggest that for 
some QM articles more than others, psycho- 
logical factors such as general attitude to- 
ward the Army are markedly influential in 
determining acceptance. These articles, then, 
are those for which acceptance testing is most 
essential. They are the ones, relatively 
speaking, for which the determination of only 
quality or utility is inadequate for the proper 
evaluation of the item. 


Received December 12, 1955. 
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Carroll H. Leeds 
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One approach to the study of teacher per- 
sonality is by way of psychometrics. The 
present article reports a study of teacher 
personality in which use was made of two 
psychometric instruments: the Minnesota 
Teacher Attitude Inventory (MTAI) and 
the Guilford-Zimmerman Temperament Sur- 
vey (GZTS). 

The MTAI (3) was designed to predict 
the social-emotional climate that a teacher 
will maintain in the classroom. It was as- 
sumed in its construction that a teacher’s 
attitudes toward pupils and pupil behavior 
provide an index to the teacher’s personality, 
and thus to his ability in establishing and 
maintaining desirable interpersonal relation- 
ships in the classroom. The MTAT has been 
shown to measure teacher-pupil rapport with 
validity coefficients ranging from .46 to .63 
(3, 6). 

The present investigation had as its prin- 
cipal object an attempt to determine some- 
what more definitely those factors in per- 
sonality and temperament that the MTAT is 
measuring. The instrument employed in this 
objective was the GZTS (5). A product of 
factor analysis, this instrument was designed 
to identify and measure ten traits of tempera- 
ment: General Activity (G), Restraint (R), 
Ascendance (A), Sociability (S), Emotional 
Stability (E), Objectivity (O), Friendliness 
(F), Thoughtfulness (T), Personal Relations 
(P), and Masculinity (M). Any relation- 
ships noted between scores on these traits 
and MTAI scores would provide some indi- 
cation of what temperament traits tend to 
characterize teachers who maintain harmoni- 
ous relations with pupils, and teachers who 
do not get along well with pupils. 

The MTAI and the GZTS, in this order, 
were administered to 300 public school teach- 
ers in one of South Carolina’s largest cities, 
with a metropolitan population approximat- 
ing 170,000. Each teacher had been ap- 


proached individually and asked to cooper- 
ate. Three hundred teachers, from the first 
grade through the twelfth, agreed to partici- 
pate in the study. 

Scores on the MTAI ranged from a plus 
120 to a minus 102, with a mean of 28.6, a 
median of 30, and a standard deviation of 
42.8. 


Correlation of Traits with MTAI 


Product-moment correlation coefficients were 
obtained between MTAI scores and scores 
for each of the ten traits of temperament 
measured by the GZTS. The only trait for 
which the data were treated separately ac- 
cording to sex (270 women and 30 men) was 
Masculinity (M). This was necessary be- 
cause of the sex differential in the norms for 
this trait. 

The correlation coefficients are presented 
in Table 1. They are all significant at the 
1% level of confidence except for three— 
those relating to the traits G, R, and T. The 
only negative coefficient (— .07), nonsignifi- 
cant statistically, is that for T. Traits most 
closely related to MTAI scores, or to teacher- 
pupil rapport, are P (Personal Relations), F 
(Friendliness), O (Objectivity), and E (Emo- 
tional Stability). Trait P shows the highest 
relationship (.52). This finding is in agree- 
ment with observations concerning this trait 
found in the GZTS Manual. According to 
the test authors: 


Of all the scores, this one has consistently corre- 
lated highest with all criteria involving human rela- 
tions. It seems to represent the core of “getting 
along with others.” ...A high score means toler- 
ance and understanding of other people and their 
human weaknesses. A low score indicates fault- 
finding and criticalness of other people and of insti- 
tutions generally. The low-scoring person is not 
likely to “get along with others” (5, p. 9). 


There is definite indication then that teach- 
ers who get along well with pupils tend to be 
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Table 1 


Correlations Between Temperament Trait 
Scores and MTAI Scores * 


Trait Correlation 


General Activity (G) .06* 
Restraint (R) .06* 
Ascendance (A) 19 
Sociability (S) .20 
Emotional Stability (E) 36 
Objectivity (O) 44 
Friendliness (F) 36 
Thoughtfulness (T) 
Personal Relations (P) 52 
Masculinity (M) 
Women (NV =270) 16 
Men (N=30) .50 





* All correlations are significant at the 1% level except those 
marked with an asterisk. 


cooperative, friendly, objective, and emotion- 
ally stable, and, to a lesser degree, manifest 
sociability, social ascendancy, and masculin- 
ity in emotions and interests. Those who do 
not have high rapport with pupils, on the 
other hand, tend to be critical and intolerant, 
hostile and belligerent, hypersensitive, de- 
pressed, and emotionally unstable. To a 
lesser degree, they tend toward submissive- 
ness, shyness, seclusiveness, and femininity. 
The results also indicate that, to a certain 


extent, the MTAI score is an indirect meas- 
ure of these temperament traits. 


Item Analysis 


To further substantiate and clarify these 
findings, a rough item analysis was made by 
comparing the item responses on the GZTS 
made by the highest and lowest 25% of 
teachers in the distribution of MTAI scores. 
These scores for the 75 teachers in the upper 
group range from 120 to 61 and for the 75 
teachers in the lower group from —3 to 
— 102. For each of the ten traits, Table 2 
presents the number and percentage of the 
30 items discriminating between the highest 
and lowest 75 MTAI scores. The results 
agree fairly closely with the correlations ob- 
tained—the highest frequencies of discrimi- 
nation are found with those traits correlating 
highest with MTAI scores and the lowest fre- 
quencies with those traits showing the least 
correlation. For example, trait P (Personal 
Relations), which correlates .52 with the 
MTAI, has 77% of its items showing dis- 
crimination in the same direction as the 
correlation, none showing discrimination in 
the opposite direction, and 23% showing no 
discrimination. 

The behavior of some of the items in traits 
correlating low with the MTAI deserves some 
explanation and warrants some examination 


Table 2 


Number and Per Cent of the 30 Items for Each of 10 Traits Discriminating Between the 
Upper and Lower 259% of MTAI Scores 








Discrimination 
in Correlation 
Direction 


Discrimination 
Opposite to } 
Correlation Discrimination 


or 
/0 


> 





. oO 
50 

3 
64 
73 
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80 
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of item content. Trait T (Thoughtfulness 
or Thinking Introversion), which correlates 
- .07 with MTAT scores, shows as many as 
40% of its items discriminating in the direc- 
tion of a negative correlation. The item, 
“You enjoy thinking out complicated prob- 
lems,” is agreed to more frequently (67% vs. 
44%) by high MTATI scorers than by low 
scorers. Such agreement contributes to a 
positive correlation. But, there are a num- 
ber of other items in the T category, such as 
the following, which contribute to the ob- 
tained negative correlation: “You are much 
concerned over the morals of your genera- 
tion.” As with the previous item, a “yes” 
response adds to the T score, but, with this 
item, only 39% of the high MTAI scorers 
respond “yes” as contrasted with 76% of the 
low scorers. Previous work with the MTAI 
(2) has identified a so-called Pharisaic-virtue 
attitude which characterizes teachers scoring 
low on the MTAI. This attitude seems to 
involve a perfectionistic and dogmatic adher- 
ence to rigidly established moral principles. 
This item, among others in the GZTS, appar- 
ently taps this attitude with a resultant mini- 
mizing effect upon the measure of the in- 
tended trait, Thoughtfulness. This effect 
undoubtedly contributes to the low correla- 
tion, even slightly negative, of MTAI scores 
with the T trait. 

The same effect seems to operate with at 
least two or more other traits in the GZTS. 
In Trait G is the following item: “It is hard 
to understand why many people are so slow 
and get so little done.” A “yes” response to 
this item contributes to a higher G score, but 
at the same time it expresses an attitude of 
hostility toward others which has also been 
found to characterize teachers scoring low on 
the MTAI (2). An item in the R category 
(Restraint), which discriminates in favor of 
the low MTAI scorer, reads as follows: “It is 
difficult for you to understand how some peo- 
ple can be so unconcerned about the future.” 
Agreement with the item contributes to the 
R score, but high MTAI scorers tend to dis- 
agree. In addition to seriousness in one’s 
disposition, this item could tap as well such 
attitudes as hostility and Pharisaic-virtue, or 
even indicate a general neurotic condition. 
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Finally, a lack of uniformity is noted in 
the behavior of the items in Trait M (Mas- 
culinity). Item 175 in the M trait reads: 
“When a parent, teacher, or boss scolds you, 
you feel like weeping.” In addition to femi- 
ninity, this item very well could tap hyper- 
sensitivity or neuroticism in general. Con- 
sider the item: “You feel very badly if 
someone does not approve of what you are 
wearing.” Agreement is supposed to indi- 
cate femininity. Could it not indicate neu- 
rotic sensitivity as well? Agreement with 
these two items was a more frequent response 
among low MTAI scorers than among those 
with higher scores. The following items op- 
erate in the same way among the women 
teachers, further raising the question as to 
just what is motivating a “yes” or a “no” re- 
sponse: “You especially dislike to get your 
hands dirty or greasy.” “The sight of ragged 
or soiled fingernails is repulsive to you.” 
“Vou can handle a loaded gun without feel- 
ing at all jittery.” “You cry rather easily.” 
“The sight of an unshaven man disgusts 
you.” “When you become emotional you 
come to the point of tears.” When the low 
MTAI scorers agree to these statements, 
rather than expressing femininity primarily, 
it is quite possible that hostility, prudishness, 
and nervous instability represent the essen- 
tial motivating conditions. Among both men 
and women respondents, between 50% and 
75% of the M items discriminating in the 
direction of the correlation are of this nature. 

The same process appears to be operating 
among M items discriminating in the direc- 
tion opposite to that of the correlation as 
indicated in the following item: “You feel 
strongly against kissing a friend of your own 
sex and age.” Although agreement con- 
tributes to the Masculinity score, the low- 
scoring MTAI teachers agree more frequently 
than do the high scorers. 

This matter of item content and the psy- 
chological processes involved in item response 
could account at least partly for the dis- 
crepancy in the results of this investigation 
and the one by Cook and Medley (1) in 
which high MTAI scorers showed more femi- 
ninity, as measured by the Minnesota Multi- 
phasic Personality Inventory (MMPI). How- 
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ever, Cook and Medley (1) did find results 
similar to those in the present investigation 
when they compared MTATI scores with De- 
pression and Social Introversion as measured 
by the MMPI. 

Notwithstanding these comments concern- 
ing some of the GZTS items, correlational 
procedure and item analysis do indicate that 
the temperaments of teachers who have high 
rapport with pupils are characterized by per- 
sonal cooperativeness, friendliness, objectiv- 
ity, and emotional stability. Somewhat less 


obvious are the traits of sociability, ascend- 
ance, and masculinity. 


Temperament Profiles 


It is realized, of course, that personality 
structure is configurational and that a mere 
identification and summation of independ- 
ently conceived traits does not provide an 
adequate personality picture. With this in 
mind, a somewhat rough attempt was made 
to consider GZTS scores as a whole or pat- 
tern. The trait scores showing the closest 
relation to MTAI scores, namely P, F, O, 
and E, were converted to T scores, summed, 
and averaged into a composite score which 
was then correlated with MTAI scores. The 
resulting correlation was .52, actually no 
higher than that obtained for P alone. A 
multiple correlation possibly would have 
shown a somewhat higher relationship. 

Examination of individual GZTS Profile 
Charts for the highest and lowest 25°. of 
teachers responding to the MTAI provided 
information supporting in general the find- 
ings thus far discussed. Sixty per cent of the 
75 teachers with high MTAI scores, as com- 
pared with only 16% of the low scorers, show 
relative elevations on the chart in traits 
EOFP and relative “dips” in Traits T and M 
(women subjects).' Seventy-six per cent of 
the 75 low scorers on the MTAI, as compared 
with approximately 30°7 of the high scorers, 
give profile patterns with relative elevations 
on T and M (women subjects) and relative 
“dips” on one or more—usually more—of 
traits EOFP. 


1 Higher scores on the masculinity continuum are 
at the bottom of the chart for women and at the 
top for men. 


Carroll H. Leeds 


No discernible differences between the two 
groups were observed from the GRAS pro- 
files. Slightly over two-thirds of each group 
showed peaks on R with “dips” on A and S. 
One-sixth of each group indicated “dips” on 
R with rises on A and S. 


Comparison of Teacher Sample with Norms 


It is of interest to compare the ten tem- 
perament trait means of the present sample 
of 300 teachers with the norm means pre- 
sented in the Manual (5). With the excep- 
tion of Trait T (Thoughtfulness), the test 
norms are based upon scores obtained from 
523 college men and 389 college women. A 
number of veterans were included in the male 
sample. The norms for the T items are based 
on scores obtained from a group of high 
school seniors and their parents (NV = 252). 

Results, presented in Table 3, show that 
differences between means of the teacher 
group and norm group are statistically sig- 
nificant at the 1°% level for all traits except 
S, T, and M. The teachers’ mean scores are 
higher for traits R, E, O, F, and P and lower 
for traits G and A. 

On the basis, then, of results obtained in 
the present study, teachers as a group differ 
from college seniors in showing more re- 
straint and seriousness, greater emotional sta- 
bility and objectivity, more friendliness, and 
a more cooperative spirit in personal rela- 
tions. College seniors, on the other hand, 


Table 3 


Mean GZTS Scores for Teacher Group and 
Norm Group 
Teacher Norm 
Group Group 
15.8 17.0 
19.9 16.4 
12.5 15.0 
19.2 18.8 
19.8 16.3 
19.7 17.4 
19.5 14.6 
17.7 18.2 
19.8 17.1 
M (Men) 18.9 19.9 
M(Women) 10.3 10.8 


** Significant at 1% level. 
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possess more drive and energy and show 
more social boldness. No difference is noted 
between the groups in sociability, reflective- 
ness, and masculinity. 
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That salesmen of different types may have 
different interest patterns is evident from the 
work of Strong (2), Flemming (1), and 
others. Detailed knowledge of these patterns 
can contribute to the effectiveness of counsel- 
ing services for students, as well as to the 
effectiveness of business concerns in selecting 
salesmen. Three hundred successful sales- 
men, divided equally among specialty sales- 
men, route salesmen, and sales engineers, and 
drawn from 22 different companies, were 
studied for significant differences in interest 
patterns on the Strong Vocational Interest 
Blank For Men. 


Procedure 


The data for this study were derived from Strong 
Interest Blanks completed by salesmen as part of the 
process of applying for a sales position with one of 
the 22 companies involved. Before taking the Strong 
Test as one of a battery of tests, the applicants had 
been preliminarily screened by the use of application 
forms and personal interviews. 

The scores included in this study were for those 
salesmen who were later judged by their sales man- 
agers after a year or more of service to be effective 
salesmen. The managers were asked this question: 
“Would you re-hire this man if he were a new ap- 
plicant and you knew as much about him as you 
now do?” A positive answer was the criterion of 
effectiveness. 

The scores studied were for the following nine 
scales of the Strong Blank: Sales Manager, Real 
Estate Salesman, Life Insurance Salesman, Produc- 
tion Manager, Personnel Manager, Accountant, Office 
Worker, Purchasing Agent, and Advertising Man. 
The six nonselling fields in the foregoing group were 
selected for inclusion in this study as representing 
a partial sampling of the kinds of customers and 
collateral activities with which salesmen become in- 
volved. Subsequent research will include such other 
interest scales as that of engineer, of which occupa- 
tion sales engineers are part. The statistical ap- 

1 This material is derived from a portion of the 
author’s doctoral dissertation completed at New 
York University. Appreciation is expressed to the 
author’s chief advisor, Dr. Brian E. Tomlinson. 

2 Correspondence relating to the material in this 
article should be addressed to the author at 440 
East 20th Street, New York 9, N. Y. 
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proach was the analysis of variance to determine 
differences among the three groups 


Results 


An analysis of variance was undertaken for 
each of the nine Strong scales involved. At 
the .01 level of significance, differences among 
the three groups as detailed in Table 1 were 
found on the scales for production manager, 
accountant, office worker, purchasing agent, 
real estate salesman, and life insurance sales- 
man. At the .05 level, differences were found 
on the personnel scale. 

For the seven scales where significant dif- 
ferences were found, Table 1 indicates the re- 
sult of further F tests to determine where the 


Table 1 
Results of F Tests for Seven Strong Scales Taking the 
Three Sales Groups Two at a Time 
Scale Fac 
24.29** 
2.90 
4.34* 
22.86** 
1.36 
12.84** 
23435” 


Fe 
13.96** 

I 
17.60** 
10.40" 
19.29** 

9.59** 


28.42** 


Fan 


1.42 
95 
4.46* 
13.19" 
10.41** 
24 
.27 


Production Manager 


Personnel Manager 
Accountant 

Office Worker 
Purchasing Agent 

Real Estate Salesman 
Life Insurance Salesman 





* Significant at the .05 level. 

** Significant at the .01 level. 

Note.—A = Specialty salesmen, B = Route salesmen, C 
Sales engineers. 


specific differences lie. Table 2 gives the 
mean for each group in terms of Strong stand- 
ard scores with the equivalent letter grade as 
commonly utilized in the Strong Test. 


Discussion 


Certain differentiating patterns emerge that 
may be helpful in distinguishing specialty 
salesmen, route salesmen, and sales engineers. 
It will be noted that sales engineers, as rep- 
resented by the sample, rate significantly 
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Table 2 


Means for Each of the Three Sales Groups in Terms of Strong Standard Scores 
with Equivalent Letter Grades 


Specialty Salesmen 
Letter 
Grade 


Standard 


Interest Scale Score 


Production Manager 39 B 
Personnel Manager 43 B Plus 
Accountant 36 B 
Office Worker 45 A 
Purchasing Agent 40 B Plus 
Sales Manager 52 

Real Estate Salesman 48 

Life Insurance Salesman 49 

Advertising Man 37 


higher on the production manager scale than 
do specialty salesmen or route salesmen. 
There is no significant difference between 
specialty salesmen and route salesmen on 
this scale. 

That sales engineers should surpass the 
other two sales groups on the production 
manager scale probably reflects the fact that 
such salesmen deal with factory or plant 
personnel to a large extent and should share 
some interests with their customers if they 
are to understand and solve production prob- 
lems as a basis for sales. Frequently, both 
sales engineers and production managers 
share a common engineering training. 

On the personnel manager scale, the sales 
engineers sampled rated significantly higher 
than the route salesmen. This may reflect a 
greater liking for working with people and 
their problems in general which can enable 
the sales engineer to sustain his enthusiasm 
through the analysis of technical problems 
and during sometimes prolonged contact with 
customer personnel in the course of selling a 
single customer. 

On the accountant scale, route salesmen 
ranked highest, followed by specialty sales- 
men, with the sales engineers showing least 
interest in common with men in the account- 
ancy field. It should be noted, however, that 
even for the highest group, the route sales- 
men, the equivalent letter grade is only “B.” 


Means 
Route Salesmen 


Sales Engineers 


Standard 
Score 


Letter 
Grade 


Letter 
Grade 


Standard 
Score 


40 B Plus 45 B Plus 
41 B Plus 45 A 
B 33 B Minus 
A 39 B 
B Plus 39 B 
A 50 A 
A 44 B Plus 
A 43 B Plus 
B Minus 35 B 


This suggests that salesmen as a group do 
not have strong interests in common with 
men handling computational data as in ac- 
counting, but that route salesmen may need 
to have more of this interest because of the 
record-keeping activities related to the many 
calls they must make each day. 

For the office worker scale, the three groups 
are also significantly different from each other, 
with the route salesmen again ranking high- 
est. This probably reflects the stock-check- 
ing bill-collecting, and order-taking activities 
that are a part of route selling. The sales en- 
gineers ranked lowest of the three groups for 
this factor. 

On the purchasing agent scale, the route 
salesmen were again the highest ranking, dif- 
fering significantly from both the specialty 
salesmen and the sales engineers. Thus, for 
the three interest scales, accountant, office 
worker, and purchasing agent, which are con- 
cerned with the handling of business detail, 
the route salesmen are shown to be signifi- 
cantly different in interest pattern. 

This predominance of the route salesmen 
on these factors may be a necessary accom- 
paniment of this type of selling in which a 
regular flow of distribution already exists. 
This is in contrast to the job of the specialty 
salesman who must create a desire to buy on 
the part of a prospect, and in contrast to the 
sales engineer’s job where some _ technical 
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problem may need to be solved for the cus- 
tomer before a sale can be made. 

On the real estate salesmen scale, all three 
groups had interest ratings of either “A” or 
“B Plus.” The sales engineers, however, 
ranked significantly lower on this scale than 
the other two groups. For the life insurance 
salesmen scale, a similar situation applies, 
with the sales engineers achieving a “B Plus” 
rating which was significantly lower than the 
“A” ratings achieved by the other two groups. 

This suggests that sales engineers as a 
group are likely to rank lower on those Strong 
interest scales dealing directly with interest 
in selling. This difference is probably ex- 
plainable in terms of the fact that sales en- 
gineers are generally technically trained men, 
often graduate engineers, who derive satis- 
faction from dealing with engineers and other 
technical men in customers’ organizations, 
and from solving customers’ technical prob- 
lems, as well as from the actual sales aspects 
of the job. 

In addition to the differentiating charac- 
teristics discussed above, certain common 
characteristics among the three groups are re- 
vealed. One of these is a relative lack of in- 
terest in common with advertising men as 
measured by the Strong advertising man’s 
scale. For the three groups, the mean Strong 
rating achieved is considered an indetermi- 
nate one. 

“B Plus” or “A” strength interest ratings 
are shared by the three groups in the areas 
of personnel manager, sales manager, real 
estate salesman, and life insurance salesman. 
While the three groups do have in common 
the fact that they obtained noteworthy rat- 
ings for these four scales, Table 2 also re- 
veals that for the personnel manager, real 
estate and life insurance salesman scales there 
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were significant intergroup differences. These 


have been previously pointed out in the dis- 
cussion of intergroup differences. 


Summary and Conclusions 


Strong interest patterns of three types of 
salesmen were compared. Significant inter- 
group differences were revealed on seven of 
the interest scales studied for this sample of 
300 salesmen. In addition, certain common 
characteristics were also noted with regard to 
five of the interest scales. 

These findings suggest that while sales- 
men as a group may share certain interest 
factors, there are also differentiating aspects 
of their patterns that should be of value to 
school counselors dealing with students who 
are considering a sales occupation, and also 
to business organizations concerned with the 
selection of salesmen. 

This study points to certain tentative con- 
clusions as to the existence and nature of 
these differences as measured by several in- 
terest scales on the Strong Blank. These 
tentative conclusions are, of course, subject 
to confirmation by cross-validation studies 
based on additional independent samples. 

The results of this study support the trend 
away from the concept of salesmen in gen- 
eral toward the concept of special sales oc- 
cupational groups. 


Received February 16, 1956. 


References 


1. Flemming, E. G., & Flemming, C. W. Test se- 
lected salesmen. J. Marketing, 1946 (April), 
1-8. 

2. Strong, E. K., Jr. Vocational interests of men 
and women. Stanford Univer.: Stanford Uni- 
ver. Press, 1943. 





The Journal of Applied Psychology 
Vol. 40, No. 5, 1956 
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Gates (1) in 1918 tested the abilities of an 
expert marksman. Spaeth and Dunham (4) 
and Humphreys, Buxton, and Taylor (2) 
have related steadiness measures to rifle 
marksmanship.* It appears that no one, 
however, has attempted to predict rifle marks- 
manship from pretraining data. The present 
study attacks this problem and offers predic- 
tive data for Army recruits trained by two 
different methods, a part method and a whole 
method.* 


Method 


The experiment was conducted twice, 
each time at a different military installation. The 
first administration of the experiment used 68 Ss at 
Fort Knox, Kentucky; the second, 88 Ss at Fort 
Jackson, South Carolina. In both cases, the Ss 
were male, light infantry basic trainees, with “A” 
physical profiles. 

Procedure. The criterion data were obtained on 
an Army rifle range during four days of firing. 
Each S fired a total of 100 rounds in slow fire and 
72 rounds in sustained (rapid) fire at target dis- 
tances of from 100 to 500 yds. For each of the two 
training methods, the Wherry-Doolittle method was 
used to obtain the multiple R between each criterion 
(slow fire; sustained fire) and seven pretraining 
variables: (a) rifle steadiness, as measured by an 
ataxiameter (3). (6) firing experience, as scored with 


Subjects 


1 F. J. McGuigan is now at Hollins College. The 
research reported here was conducted by the authors 
while they were employed by the Human Resources 
Research Office, The George Washington University, 
operating under contract with the Department of 
the Army. Opinions and conclusions are those of 
ihe writers, and do not necessarily represent views 
of the University or the Department of the Army. 

“The rather high correlation coefficients reported 
in these studies were not found in a recent study by 
McGuigan and MacCaslin (3), who found the rela- 
tionship between rifle steadiness and marksmanship 
to be relatively low. 

* The data presented in this paper were obtained 
in the course of a larger study (see McGuigan, F. J. 
and MacCaslin, E. F. Whole and part methods in 
learning a perceptual motor skill. Amer. J. Psy- 
chol., 1955, 68, 658-661). In that study, the su- 
periority of the whole method over the part method 
was found to (@) be significant for slow fire for Ss 
of all levels of intelligence, and (6) approach sig- 
nificance for sustained fire for Ss of above-average 
intelligence. 


a questionnaire,* (c) educational level, as defined by 
the number of years of schooling, (d) and (e) in- 
telligence, as measured by scores on the Armed 
Forces Qualification Test and by Aptitude Area I 
scores from the Army Classification Battery, (f) 
mechanical aptitude, as measured by the score on 
the Mechanical Aptitude Test of the Army Classifi- 
cation Battery, and (g) mechanical information, as 
measured by the score on the Shop Mechanics Test 
of the Army Classification Battery. The data for 
variables c-g were obtained from the trainees’ Army 
personal data files. Multiple Rs were also obtained 
by using only two of the predictor variables, intelli- 
gence and firing experience. 


Results 


Prediction of rifle marksmanship. The 
variables selected by the Wherry-Doolittle 
method were not selected consistently in each 
administration of the experiment. The fre- 
quency of selection of two variables, intelli- 
gence (Aptitude Area I score) and firing ex- 
perience, and the fact that most of the other 
variables correlated well with intelligence, 
suggested that these two variables be used 
throughout. Table 1 shows the Wherry- 
Doolittle and two-variable Rs obtained in the 
two processes showed no significant d‘tfer- 
ences. 

Prediction of rifle marksmanship as a func- 
tion of training method. The two-variable 
Rs for each administration of the experiment 
were found to be not significantly different 
from each other and were averaged by means 
of Fisher’s z method. For slow fire, the mean 
two-variable Rs are .38 for the part method 
and .61 for the whole method. The differ- 
ence between these Rs approaches statistical 
significance at the 5% level. For sustained 
fire, the mean two-variable Rs are .32 for the 
part method and .67 for the whole method. 
These Rs differ significantly beyond the 1% 
level. Training by the whole method thus 
appears to give higher predictability for the 

4 The firing experience questionnaire developed for 
the purposes of this study took about 15 min. for 
group administration. 
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Table 1 
Correlations Between Predictor Variables and 


Marksmanship Criteria 


Part Method Whole Method 


Sus 
tained 
Fire 


Sus 
tained 
Fire 


Slow 
Fire 


Slow 


Replication Fire 


Fort Knox 
Wherry-Doolittle R _.44* 
Two-Variable R 40 

ee Oe 33 

Fort Jackson 
Wherry-Doolittle R = .33 
Two-Variable R 36 

N 44 


— 


* Significant beyond the 5% level. 
** Significant beyond the 1% level. 


pretraining variables studied here than train- 
ing by the part method does. 


Summary 


This study obtained multiple correlations 
showing the relationship between seven pre- 
training variables (rifle steadiness, firing ex- 
perience, educational level, two measures of 
intelligence, mechanical aptitude, and me- 


chanical information) and_ end-of-training 
marksmanship. It was found that two of the 
variables, intelligence and firing experience. 
predicted end-of-training marksmanship sub- 
stantially as well as all seven variables taken 
together. It was also found that higher pre- 
dictability was obtained by using the whole 
method than by using a part method. The 
average two-variable Rs for the whole method 
were .61 for slow fire and .67 for sustained 
(rapid) fire; for the part method, .38 for 
slow fire and .32 for sustained fire. 
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Recently two successive issues of this Jour- 
nal have contained two articles that perpetu- 
ate a commonly held misconception (3, 14). 
The type of study of concern is one in which 
two or more different weighting methods are 
applied to a set of tests or test items. Vari- 
ous characteristics of the keys thus produced 
are then compared. The characteristic of 
major concern here is validity. Studies of 
this type may be divided into two kinds: 
One kind actually compares the validity of 
the two differently weighted keys and is 
illustrated by Jurgensen (3), although many 
similar articles can be found (1, 6, 8, 11, 12). 
The second kind simply correlates the two 
keys and concludes from this very high cor- 
relation, say .98, that the two keys must 
necessarily have equivalent or nearly equiva- 
lent validities for any given criterion. This 
kind of study is illustrated by Trites and 
Sells (14) and others (e.g., 2, 4, 5, 7, 9, 10, 
13, 15). 

Studies of the second kind make false con- 
clusions when they imply or explicitly con- 
clude from the typically high correlation be- 
tween two differently weighted composites 
that these composites must have equal validi- 
ties. For example, Trites and Sells state, “It 
may be concluded, then, that in most in- 
stances there is little gained by use of frac- 
tional weights” (14, p. 454). In his classic 
article Stalnaker, similarly, concludes from a 
correlation of .99 that, “The relationship be- 
tween the weighted and unweighted scores is 
so high, so nearly perfect, that there is little 
justification for the use of weights with these 
examinations. . . . The influence of the usual 
weighting factors is so small as to be insig- 
nificant” (10, p. 490). Again, Webb says, 
“However, since the scores on the Likert scale 
obtained by the Likert and Thurstone scoring 
methods are highly correlated, one might ex- 
pect the Thurstone scale to possess a degree 
of validity approximate to that of the Likert 
scale” (15, p. 469). 


Studies of the first kind make proper con- 
clusions, since validities are actually com- 
pared. However, the same fallacy occasionally 
appears. For example, Jurgensen says, “Cor- 
relations between statistically determined and 
arbitrarily assigned weights were so high that 
they can be considered one and the same” 
(3, p. 307). Only Strong’s voice has been 
raised in opposition. “Our experience... 
shows that two systems of testing may cor- 
relate over .90 and have equally high reli- 
ability and yet one may have much higher 
validity than the other” (11, p. 70). 


The Proof 


The magnitude of the error in concluding 
equal validities for keys that correlate ex- 
tremely high will be displayed by consider- 
ing the simple case of two such keys and one 
criterion. Together with the requirement 
that the multiple correlation shall not exceed 
unity and the formula for multiple correla- 
tion in terms of Pearson rs, the following 
formula can be derived (e.g., 16, p. 280): 


9 9 9 9 
NA ty + v1 ~~ te Fi a 127 iy", 


where variables 1 and 2 are scores on the two 
keys and variable y is the criterion. This 
formula was used in constructing Table 1 
and expresses the limits of the validity of 
variable 2 in terms of the validity of vari- 
able 1 and the intercorrelation between 1 and 
2. The purpose of Table 1 is to show the 
possible difierences in validities for highly 
correlated keys. While these limits are theo- 
retically possible, it should be noted that 
they would be unlikely of attainment in prac- 
tice. Most of the studies of the first kind 
noted above illustrate this. The correlations 
chosen were selected on the basis that they 
represented typical validities and were typi- 
cal of the correlations found between two 
keys used to score the same tests. 

Taking the values from Jurgensen’s re- 
search, the use of the table may be illus- 
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Table 1 
Maximum and Minimum Validities of One Key, Given 
the Validity of the Other Key and the 
Correlation Between the Keys 


Key Validity of Other Key 
Inter- ——_——— - - 
correlation 30 : 50 .60 
.998 24-36* 44-55 55-65 
.996 21-38 42-58 53-67 
994 19-40 40-59 51-68 
.992 18-42 39-61 49-70 
.990 16-43 37-62 48-71 
980 10-48 32-06 43-75 
.960 02-56 24-72 34-82 
940 —04-61 17-77 29-84 
.920 — 10-65 12-80 24-87 
.900 — 15-69 07-83 19-89 


06-69 
01-73 
—0(04-76 





* Decimal points have been omitted from the body of the 
ble. 


ta 


trated. Two keys correlate .996, if one key 
correlated with the criterion .40, then it 
would be conceivable for the other key to 
correlate with the criterion .48 or 32. In 
the opinion of the writer the difference be- 
tween .40 and .48 might well be of practical 
importance in an actual prediction problem. 
When the correlation between the scores on 
the two keys is lower the possible difference 
between the two validities increases very 
rapidly. For example, if the keys correlate 
.98 one key could have a validity of .40 and 
the other a validity as high as .57 or as low 
as .21. For a correlation of .92, the validity 
of one key could be .40 and the other as high 
as .73 or as low as .01! 

Table 1 has other applications. Much of 
the research on the speed vs. power issue can 
be similarly criticized. The common practice 
of “validating” a new test by correlating it 
with an already extensively validated test 
fallaciously implies that the new test will 
have similar validities. In general, psycholo- 
gists seem to have been rather unduly im- 
pressed with high correlations. 


Conclusion 


The moral is plain. One may use two dif- 
ferent sets of item weights to score the same 
group of tests and find an exceedingly high 
correlation between the scores thus produced. 
Still, as long as the correlation is not 1.00, it 
is possible to find that the validities of the 


two keys differ to a statistically significant 
and practically important degree. The no- 
tion that just because two keys correlate very 
highly they may be used interchangeably for 
any purpose is false. 
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Fleishman (3) described the development 
by the Ohio State Leadership Studies of a 
measure of leadership attitudes in industry. 
The Leadership Opinion Questionnaire yields 
two scores, Consideration and Initiating Struc- 
ture, with respective estimated reliabilities of 
.70 and .79 and an intercorrelation of — .01. 
\ foreman with a high Initiating Structure 
score would favor assigning people in the 
work group to particular tasks, criticizing 
poor work, and emphasizing the meeting of 
deadlines. A foreman with a high Consid- 
eration score would emphasize getting the 
ipproval of the work group on important 
matters before going ahead; he would stress 
willingness to make changes and the doing of 
personal favors for people in the work group 
(1). 

Fleishman applied the scales to evaluating 
the effects of a leadership training program 
among International Harvester supervisors. 
He noted that foremen who operated under 
more considerate “climates” described them- 
selves as more considerate. A more consid- 
erate climate was one where the foremen be- 
lieved they had more considerate bosses, and 
the bosses wanted their foremen to be more 
considerate. Low positive correlations also 
existed between the foremen’s attitudes of 
consideration and initiation, their bosses’ be- 
havior, and the foremen’s estimates of what 
was expected by the bosses (2). 

Labor grievances were lower where the 
foremen’s bosses expected foremen to be con- 
siderate, where foremen were described so, 
and to a lesser extent where the foremen per- 
ceived such expectations by their bosses (1). 

These results led to the expectation that 
the Leadership Opinion Questionnaire could 
be used to forecast success as a supervisor in 
a company which has been a leading exponent 
of progressive personnel practices in recent 
years and where heavy emphasis has been 
placed on the value of the individual worker. 
(High capital investment and relatively low 


labor costs of production make such an en- 
lightened attitude both sensible and possible.) 
It was hypothesized that supervisors in this 
setting, who held more favorable attitudes 
toward consideration as a mode of leadership 
behavior, would be rated- more highly by 
their supervisors. 


Method 


Seventy-seven supervisors, most of whom were at 
the lowest or second lowest level in the management 
hierarchy of a petrochemical refinery, were adminis- 
tered the Leadership Opinion Questionnaire in which 
they indicated what they, as supervisors, ought to 
do, not what they actually did do. The Initiating 
Structure and Consideration scores obtained from 
the questionnaire were correlated with forced-choice 
performance reports collected for 53 of these super- 
visors approximately two years later. These per- 
formance reports by superiors had been found to 
discriminate validly (r= .62 to .84) among super- 
visors voted high, medium, and low, by pooled judg- 
ments (4). Odd-even and equivalent form reliabili- 
ties were above .90. 


Results 


In line with expectations, a correlation of 
.29, significant ‘at the 5% level for 51 df, 
was obtained between Consideration and the 
forced-choice performance report two years 
later. At the same time, a correlation of only 
— .09 was found between rated success as a 
supervisor and attitudes favoring Initiating 
Structure. Thus, the extent supervisors con- 
formed verbally in attitude to the company 
Zeitgeist forecast their rated success as su- 
pervisors two years later. 

On the one hand, the obtained correlation 
of .29 between Consideration and future suc- 
cess as a supervisor is probably too low by 
itself for practical significance as a predictor 
of supervisory success; on the other hand, 
the subjects, already employed supervisors, 
were undoubtedly more homogeneous in train- 
ing, attitude, and ability than candidates or 
applicants. A higher correlation would be 
expected in a more heterogeneous sample. 
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The Leadership Opinion Questionnaire may 
provide a valuable addition to a supervisory 
selection battery in organizations emphasiz- 
ing the need for supervisors to be considerate. 


Summary 


The Leadership Opinion Questionnaire was 
administered to supervisors of a firm noted 
for its emphasis on progressive personnel re- 
lations and interest in the welfare of the in- 
dividual employee. <A correlation of .29 was 
found between the extent to which a super- 
visor believed he ought to be considerate of 
his subordinates and the extent to which he 
was rated a successful supervisor by his su- 
periors two years later. No consistent rela- 
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tion was found between favoring Initiation 
of Structure and rated success. 
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The scarcity of reports on the validation of 
attitude measures in terms of their ability to 
predict individual respondent behavior has 
been cited by McNemar (8), Campbell and 
Katona (2), and Blankenship (1), as well as 
others. In one of the few reported studies, a 
correlation of .024 was found between an 
attitude scale on cheating and actual cheat- 
ing behavior by a class of college students 
(3). Validity of respondent answers has 
most often been checked by comparing an- 
swers with records (6) or evidence of recent 
behavior (7). 

The study described in this paper was an 
exploratory design for measuring the validity 
of individual prediction of behavior over vari- 
ous periods of time and the predictive va- 
lidity of a related attitude scale. 

The study required (a) frequent access to 
the sample and (4) a behavioral criterion 
which was well defined. The sample selected 
consisted of college students. The behavioral 
criterion selected was attendance at college 
football games. 

In the fall of 1950, a week prior to the 
football season, a questionnaire was adminis- 
tered to 253 students at the University of 
Southern California. The questionnaire listed 
the University’s football games; the students 
indicated whether they would or would not 
attend each game, or were doubtful. The 
questionnaire also contained 12 questions re- 
lated to attitude toward football games. A 
sample question and its answer categories 
follows: 

“How would you describe attendance at 
football games? (check one) 

“Very worthwhile ; Worthwhile : 
Not very worthwhile : Worthless 7 
Each student signed his questionnaire. Scor- 
ing weights for the answer categories ranged 
from 1 through 4, in order from least to 
most favorable. Answer-category scores were 
summed to yield scale scores. 


On each Monday following a game, the 
students present in class (varied from 193 
to 234) were given a brief form on which 
they checked whether they had or had not 
dttended the game. The students also signed 
these forms. (It is assumed in this study 
that the validity of reported attendance was 
high and consistent from game to game.) 
The possibility that students biased their 
postgame reports in accordance with their 
preseason prediction is considered negligible. 
The students were not aware of the purpose 
of the study; those who made guesses thought 
the study related to the effect of televised 
games on game attendance. 

Since the students signed both the presea- 
son survey and the follow-up surveys, it was 
possible to tabulate for each student his pre- 
dictions, scale score, and subsequent behavior 
(5). 

The attitude scale had a possible score 
range of 12 to 48. The obtained range was 
27 to 47, the median 39.5, the mean 38.4, 
the standard deviation 3.7. The mean for 
those predicting attendance at the first game 
was 42.7; for those predicting nonattendance, 
36.5. The median was used as a critical 
score for computation of attitude scale cor- 
relations. The correlation used is the phi co- 
efficient, corrected for restriction in size (phi 
phi max.) (4, p. 433). All correlations were 
statistically significant (p > .01). 

Attendance at all games was less than pre- 
dicted (Table 1). Increased error with re- 
moteness in time is shown for those predict- 
ing attendance; predictions of nonattendance 
were much more valid. The “doubtful” 
group in general attended games in larger 
proportions than those predicting nonattend- 
ance, but not in proportion to original group 
prediction, subsequent group attendance, or 
any other proportion revealed by the data. 

The increase in error of prediction with 
remoteness in time is again shown in Table 2. 





Peter A. Holman 


Table 1 





Category First 
Size of sample 231 
Predicting attendance (%) 66 
Attending (%) 65 
“Will attend” attending (%) 96 
“Will not attend” attending (%) 7 
“Doubtful” attending (%) 21 


Second 


234 


Preseason Prediction and Subsequent Attendance 





Game 


Fourth 


Fifth 


Sixth Seventh 


Third 
210 196 
74 65 33* 
61 63 25 
80 72 60 
12 5 4 
7 21 13 


* An “‘Away” game customarily attended by many students. 


** Rain, 
The error increased sharply between the 
fourth and fifth game (p > .01). 

The scale score correlations between pre- 
diction and between behavior showed an in- 
verse relationship, indicating the fallibility of 
prediction as a criterion for attitude scale 
validity. (The correlations indicate that atti- 
tude was a stronger determiner of attendance 
at the fourth (away) game than any of the 
other games. A contrary conclusion would 
have been drawn if prediction were the 
criterion.) The correlations between scale 
score and prediction tended to increase over 
the season and in general were inverse to the 
trend of the correlations betwen student pre- 
diction and attendance. The correlation be- 
tween scale score and total games attended 
was higher (p> .01) than the correlation 
between scale score and total game attend- 
ance predicted. 

Observation indicated that “football atti- 
tude” was not the only factor determining 
game attendance. Some students worked 


Saturdays; for some students game attend- 
ance was a social outing. If these factors 
had been considered in advance, the initial 
survey could have determined which students 
worked Saturdays, and a “social attitude” 
scale could have been added to the question- 
naire. Knowledge of these factors might have 
made possible a better test of “football atti- 
tude” relationship to attendance. 

As a test of the possible predictive validity 
of the scale under such conditions, the writer 
predicted that (a) those students with high 
scale scores predicting attendance would at- 
tend, and (4) those students with low scale 
scores predicting nonattendance would not at- 
tend. The correlations between prediction 
and attendance for these two groups, shown 
in the last row of Table 2, are higher (p > 
.01) than the correlations between student 
prediction and attendance (first row of Table 
2) for all except the second game. 

The data suggested that attitude scale 
scores could be used to make a more accu- 


Table 2 


Correlation (Phi) Between Prediction and Subsequent Attendance 











Variables First 





Prediction and attendance (S)* 84 79 
Scale score and prediction 39 53 
Scale score and attendance 49 34 
Prediction and attendance (£)** .99 76 


Second Third Fourth Fifth Sixth Seventh 


87 78 49 Ad 46 
64 34 68 61 59 
39 59 .27 21 36 
95 92 57 78 91 





* Preseason prediction by students (does not include doubtful category). 


** Prediction by author for scale-selected groups. 
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Table 3 


Percentage-Point Error* in Prediction 


Predictor 


Students 
E using 50:50 split of “dubious” 441 
E using ratio split of “dubious” 0 





Game 


First Second Third Fourth Fifth 


Average 


Sixth Seventh Error 


+29 
—8 
—6 


+20 15 
—6 9 
—6 7 


+12 
+14 
+7 


+8 
+12 
+19 


+24 
—8 
—4 





* (Predicted attendance %) — (attendance %) = percentage-point error. 


rate prediction of behavior than the predic- 
tion of the sample itself: a prediction of at- 
tendance was made for students with high 
scale scores predicting attendance. A pre- 
diction of nonattendance was made for stu- 
dents predicting nonattendance, regardless of 
scale scores. Predictions for the remaining 
students (dubious group) were made in two 
ways: (a) they would attend in ratio to the 
other groups, and (6) half would attend. 
Predictions of attendance were then summed 
to yield total predicted attendance. 

Splitting the dubious group in ratio to the 
“predictable” groups enabled substantially 
more accurate prediction (than was made by 
the students themselves) for the last three 
games (Table 3). Splitting the dubious group 
50:50 enabled more accurate prediction in 
five of seven games and a much smaller over- 
all season error. 


Summary 


The study showed that (for the sample and 
behavioral area studied) although predictions 
of future behavior were not highly valid and 
the predictive validity of the attitude scale 
used was less high, a high degree of predictive 
validity might be secured by grouping indi- 
viduals into categories determined by both 
attitude-scale scores and individual predic- 
tions. 

The study also indicated that successful 
validation of an attitude scale with behavioral 
criterion requires that the attitude measured 
be the primary factor affecting behavior. 
When other variables may affect behavior, 


additional data should be secured and the 
sample should be fractionated into at least 
three categories: (a) those who have free- 
dom of choice and who are not primarily 
motivated toward the behavioral criterion by 
factors other than the attitude under study; 
(6) those with a positive attitude but who 
are restricted in their behavior by other fac- 
tors; (c) those with a relatively negative atti- 
tude but who may respond positively to the 
behavioral criterion because of factors or atti- 
tudes other than the attitude under study. 
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Nearly 700,000 persons of. Puerto Rican 
origin are now residents in the 3J. S. A. with 
the greatest concentration in the New York 
Metropolitan area. Only a smil) percentage 
of this large and rapidly increésing segment 
of the population is able tosspeak English 
fluently. 

The impact of large groups of Spanish- 
speaking children on metropolitan schools has 
been widely recognized, and strenuous efforts 
to understand the cultural and intellectual 
factors involved in adapting the school pro- 
gram to the language problem are now under 
way. A more diffuse problem is presented by 
the adult members of the group. Communi- 
cation difficulties are particularly serious in 
working toward the most productive eco- 
nomic utilization of this new element in the 
available labor force. 

On the practical level of individual job de- 
cisions, the problem becomes obvious at the 
point of evaluating an applicant’s intellectual 
abilities, job level classification, and poten- 
tial for job training. Suitable testing pro- 
cedures for this necessary process with Span- 
ish-speaking Americans would be useful to 
vocational guidance agencies, employers, and 
others whose work effectiveness is blocked by 
the communication problem. 

As a Spanish-speaking psychologist in a re- 
habilitation center, the writer had the oppor- 
tunity to administer the Wechsler-Bellevue 
Test to Puerto Rican Americans who had in- 
curred a disability following a work accident. 
The patients were referred to the Institute of 
Physical Medicine and Rehabilitation at New 
York University for vocational advisement 
supplementary to medical treatment. The 
use of experimental Spanish forms of Wech- 
sler scales with Puerto Rican adults proved 
of small value. Even if an official version of 
the scales becomes available, it is doubtful 
whether a long, individually administered test 


would satisfy the major needs for estimating 
the intelligence of Puerto Rican adults. There 
are very few Spanish-speaking psychologists, 
and, of course, the Wechsler scales are too 
time consuming to be used for screening pur- 
poses in industrial selection or for quick clas- 
sification in guidance agencies. The prac- 
tical situation requires a short group test 
which can be administered to adults who 
speak little or no English. There should be 
no requirement for professional training or 
special language skills in the test administra- 
tion. It is obviously unrealistic to demand 
the services of Spanish-speaking psychologists 
in coping with the huge volume of guidance 
and selection problems that are arising. 

The Oral Directions Test, which was origi- 
nally developed to provide an_ intelligence 
score for job applicants covering the widest 
practical range of education and ability, has 
been issued in a Spanish-language version for 
specific use in screening job applicants at a 
large refinery in South America. The entire 
test, including all directions in the Spanish 
language, is orally administered by use of a 
15-minute magnetic tape recording. The test 
is simple and practical and can be used for 
individual or group administration. The test 
administrator requires no special training 
other than knowing how to manage a group 
testing situation. The timing of the test is 
automatically standardized by the recorded 
presentation of all instructions and test items. 

The test is readily available and appears to 
meet the ideal requirements for many press- 
ing applications in screening and classifying 
non-English-speaking Puerto Rican Ameri- 
cans. The population of interest is known 
in advance to cover a wide range of ages and 
ability, and to include large numbers of adults 
of limited education ard vocational experi- 
ence. The major uncertainty relative to the 
use of the Spanish version of the test with 
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Spanish Form of Oral Directions Test 


Puerto Rican groups lies in the question 
whether a translation suited for Spanish- 
speaking Venezuelan workers would be fully 
comprehensible to Spanish-speaking Puerto 
Ricans. It is possible that linguistic and dia- 
lectic differences between the cultures would 
invalidate the use of the test in a different, 
distant, and isolated Spanish-speaking region. 

During a recent visit (1956) to San Juan, 
Puerto Rico, the writer had an opportunity 
to investigate the suitability of the existing 
Spanish form with Puerto Rican groups. The 
recorded version was first reproduced in the 
presence of Puerto Rican psychologists on the 
staff of the Veterans Administration in San 
Juan and a group of language teachers em- 
ployed in Puerto Rican educational institu- 
tions. The consensus clearly indicated that 
the recorded version contained no linguistic 
problems unsuited for the Puerto Rican popu- 
lation. Though there are vocabulary, usage, 
and idiomatic variants peculiar to the widely 
separated Spanish-speaking populations of 
Central and South America, the simplicity of 
language in the test avoided the problem. 
Language comprehension is unquestionably a 
factor in the test performance of individuals, 
but the user of the Spanish version of the 
Oral Directions Test with Puerto Rican 
groups can be confident that any such fac- 
tors present in the scores properly reflect the 
comprehension ability of the individual. The 
scores are not invalidated by the use of inap- 
propriate linguistic elements in the directions 
or the items of the test. 

The utility of the test was evaluated by 
trial in three groups of students in Puerto 
Rican schools. The first group were young 
adults attending evening classes of the school 
system of San Juan. Most of them belonged 
to the working class and were pursuing regu- 
lar school studies. At the time of the testing 
they had reached the fifth or sixth grade cur- 
ricular level of education. The median score 
of this group (11 points) is close to the me- 
dian score (12 points) of the 1,281 laborers 
tested in Venezuela. The second group were 
trainees in aviation mechanics attending the 
Michel Such Metropolitan School of San Juan, 
one of the world’s largest vocational schools. 
One of the requirements for admission to this 


Table 1 
Oral Directions Test, Spanish Language Form 


Frequency Distribution of Scores Obtained by Young 
Adult Students in San Juan, Puerto Rico 


Group 1—33 men and women enrolled in evening classes. 
6th grade level. Ages 17-43. 

Group 2—20 male high school graduates studying avia- 
tion mechanics. Ages 17-21. 

Group 3—44 eleventh grade boys and girls in a univer- 
sity high school. Ages 16-19. 
Score 


Group 1 Group 2 


Group 3 


38-39" 2 
36-37 
34-35 
32-33 
30-31 
28-29 
26-27 
24-25 
22-23 
20- 

18 


2-3 
0-1 


Median 11 


* Maximum possible score = 39. 

program is a high school education. As ex- 
pected, because of the selective admission, 
this group obtained higher scores (median = 
28) than the evening school class. Forty-four 
girls and boys comprising the eleventh-grade 
students attending the Escuela Superior de 
la Universidad formed the third group. The 
school is attached to the Faculty of Educa- 
tion of the University of Puerto Rico. Most 
of these students are children of university 
personnel including academic appointees, of- 
fice staff, and maintenance workers. The me- 
dian score of this select group was 32. There 
were no difficulties in administering the test. 
The sound reproduction was clear. The man- 
agement of the materials was easy, and the 
students enjoyed the experience. 
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Table 1 presents details of the frequency 
distributions of the three groups. The two 
extreme groups are sharply discriminated, and 
the vocational school group exhibits the ex- 
pected wide range and overlap. It is appar- 
ent that the test effectively covers a very wide 
range of ability. 

In view of the present problem of testing 
Puerto Ricans in New York ‘City and else- 
where, the Spanish form of the Oral Direc- 
tions Test should prove useful and practical 
in settings where an intelligence score is 


Victor D. Sanua 


needed for screening, guidance, or placement 
in training. 
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